Advanced multi-tenant Hadoop and Zookeeper protection
Jul 5, 2017
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
Zookeeper is a critical component to Hadoop’s high availability operation. The latter protects itself by limiting the number of maximum connections (maxConns = 400). However Zookeeper does not protect himself intelligently, he refuses connections once the threshold is reached. In such case, the core components (HBase RegionServers / HDFS ZKFC) will no longer be able to initialize a connection and the service will be degraded or unavailable!
It’s very easy to do a DoS attack on Zookeeper. On the other hand, because most of Zookeeper installation are inside trusted or semi-trusted zones, these attacks are often involuntary. It’s enough for a developer to be unresponsive and launch a custom code that opens looping zookeeper sessions without closing them. In this case, zookeeper is compromised, and all the components with it.
Solutions
Several workarounds can be set up independently or jointly.
Utilisation des Observers
The observers are particular zookeepers nodes:
- They do not participate in Quorum
- They synchronize on participating nodes
- They transfer write requests to participating nodes
They therefore make it possible to increase the number of nodes without slowing down the election process.
Using iptables
It is possible to protect yourself from external DoS via iptables. Indeed we can limit the number of connections on the port of zookeeper (2181) by IP address. This makes it possible to put a lower limit on the Zookeeper maxConns and thus block a particular address without blocking access from another machine.
Example
Imagine the following cluster topology:
- 3 edge nodes:
edge1.adaltas.com, edge2.adaltas.com edge3.adaltas.com
- 3 master nodes:
master1.adaltas.com, master2.adaltas.com, master3.adaltas.com
- n worker nodes: n’interviennent pas dans ce cas
- These machines are located in the subnet:
10.10.10.0/24
The masters nodes are used as the elective quorum and the edges as observers in order to increase the load.
NB: the even number of nodes is not a problem, only 3 are participants
Zookeeper configuration
We set up these configurations (/etc/zookeeper/conf/zoo.cfg):
On the master nodes:
clientPort=2181
maxClientCnxns=200
peerType=participant
server.1=master1.adaltas.com:2888:3888
server.2=master2.adaltas.com:2888:3888
server.3=master3.adaltas.com:2888:3888
server.4=edge1.adaltas.com:2888:3888
server.5=edge2.adaltas.com:2888:3888
server.6=edge3.adaltas.com:2888:3888
On the edge nodes:
clientPort=2181
maxClientCnxns=200
peerType=observer
server.1=master1.adaltas.com:2888:3888
server.2=master2.adaltas.com:2888:3888
server.3=master3.adaltas.com:2888:3888
server.4=edge1.adaltas.com:2888:3888
server.5=edge2.adaltas.com:2888:3888
server.6=edge3.adaltas.com:2888:3888
On the master nodes, it is forbidden to communicate with external machines on port 2181 (only the local network is allowed) via the following iptables rule:
-A INPUT -m state --state NEW -m tcp -p tcp -s 10.10.10.0/24 --dport 2181 -j ACCEPT
Thus these zookeeper instances are only accessed by our internal services and processes.
Edge nodes limit communication with external machines on port 2181 to 100 simultaneous IP connections via the rule:
iptables -A INPUT -p tcp --syn --dport 2181 -m connlimit --connlimit-above 100 --connlimit-mask 32 -j REJECT --reject-with tcp-reset
Configuration Hadoop
For Hadoop services (HDFS ZKFC, HBase Master, etc.), the following connection string is specified:
master1.adaltas.com:2181,master2.adaltas.com:2181,master3.adaltas.com:2181
For “client” configurations (YARN containers, client hbase, third-party applications, etc.) the string is specified:
edge1.adaltas.com:2181,edge2.adaltas.com:2181,edge3.adaltas.com:2181
Thus, when an external job or application launches, it can not saturate the quorum and does not compromise the state of the cluster.
Go further, silotage of observers nodes
If an external “fraudulent” application uses the string edge1.adaltas.com:2181,edge2.adaltas.com:2181,edge3.adaltas.com:2181
then it is possible that it saturates the 3 observed nodes. Thus, although the cluster remains stable, some services will be unavailable, since customers will no longer be able to view Zookeeper.
We can limit the impact of the saturation of the observers by decomposing the chain into several substrings which will be specified in the client configurations. Example:
- Chain 1:
edge1.adaltas.com, edge2.adaltas.com
- Chain 2:
edge1.adaltas.com, edge3.adaltas.com
- Chain 3:
edge2.adaltas.com, edge3.adaltas.com
Thus, if string 1 is saturated, edge3 remains available. Applications targeting channel 2 and 3 will not be blocked.