Virtual machines with static IP for your Hadoop development cluster
By David WORMS
Feb 27, 2013
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
While I am about to install and test Ambari, this article is the occasion to illustrate how I set up my development environment with multiple virtual machines. Ambari, the deployment and monitoring tool for Hadoop cluster, will be the subject of a yet to be written article. My virtual environment is VMWare but VirtualBox has the same network functionalities and should work as well.
Note, I have since then written a similar article covering additionnal functionnalities and using VirtualBox.
What’s really important here is to assign to each virtual machine a fixed IP address which won’t change over time. I personally work on a MacBook pro laptop and I found it very frustrating to restart each of the Hadoop components when I receive new IP addresses while switching between networks. Additionally, the setup should also provide an Internet gateway.
Virtual environment
Cloudera and Hortonworks seem to like the RedHat enterprise distribution and its Open Source counterpart CentOs. I’m using the CentOs LiveCD at version 6.3. I’m personnaly more familiar with Debian, but let’s go for it. I recommend creating 3 virtual machines, that shall be enough to simulate a mini Hadoop cluster.
Upon installation of your CentOs system, Internet is working with a NAT connection. The Gnome Network Manager is taking care for managing this network interface. There are no trace of a /etc/sysconfig/network-scripts/ifcfg-eth0 but running ifconfig
shall show it as eth0
. We will later disable the Gnome Network Manager and define eth0
manually with dhcp
.
Host-Only Custom Network Connection
In the end, we will end up with two network interfaces, one for internal networking and one for internet access. The internal network is created with a Host-Only networking. With Host-Only networking, your virtual machines will be able to communicate between themselves as well as with the host system.
Now comes the time to create our local network. From the VMware global preference window, inside the “Network” tab, you shall be able to create a new custom network configuration. Its default name should look like vmnet1
. Check the “Connect the host Mac to this network” and uncheck both the “Allow virtual machines on this network to connect to external network” and the “Provide addresses on this network via DHCP”. The rest could be left as is. After creation, my values are 172.16.134.0
for the subnet IP and 255.255.255.0
for the subnet mask.
Configuring the network interfaces
Next, for each virtual machine, you should shut it down, go to the its settings pane and create two new network adapters by clicking the “Add device” button and selecting “Network Adapter”. Once the network adapters are created, you could start the virtual machine. You should have two network adapters available. Set the first one to “NAT” and the second one to your Host-Only network (vmnet1
for example).
We will create the two interfaces. Make sure to replace the IPADDR
field with your personal IP addresses which must be in the range defined by your custom Host-Only networking configuration.
cat > /etc/sysconfig/network-scripts/ifcfg-eth0 <<IF
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
IF
cat > /etc/sysconfig/network-scripts/ifcfg-eth1 <<IF
DEVICE=eth1
IPADDR=172.16.134.11
NETMASK=255.255.255.0
BOOTPROTO=static
ONBOOT=yes
IF
service network restart
We will finally disable the Gnome Network Manager which provides detection and configuration for systems to automatically connect to network. Instead, the network service is activated.
service NetworkManager stop
chkconfig NetworkManager off
service network start
chkconfig network on
Conclusion
I everything went well, your virtual machines should all connect on the Internet using eth0
and communicate with static IP using eth1
. Running ifconfig
shall show your 2 IPs. You might have to reboot if restarting the network isn’t enough.