Hadoop development cluster of virtual machines with static IP using VirtualBox

A few days ago, I explained how to set up a cluster of virtual machine with static IPs and Internet access suitable to host your Hadoop cluster locally for development. At the time I made use of VMWare. I’m getting back with the same topic but this time using the VirtualBox manager.

I decided to give a change to VirtualBox as an alternative to VMWare for multiple reasons. Installation of CentOs partially failed at the end. I need to reboot the machine. No real consequences but not a thing I appreciate. VirtualBox is free and open source, VMWare isn’t open source and even commercially distributed on OSX. Another goodies I was interested in, the ability to choose the IP rage of address for my internal network, I have a limited memory dedicated to those sort of things. After many trials, I managed to install only once the VMWare tools, don’t ask me how I did it, another traumatism. Finally, I have the sweet hypothetic idea of scripting the virtual machine provisioning and installation process. If I’m not wrong, that shouldn’t be a problem with VirtualBox.

The principle is fairly similar but if you haven’t read that previous Article, here’s the idea. After installing a CentOs system, you are setup with a single ethernet interface which shares your host computer connection. We will create a Host-Only custom network connection in Virtual Box and configure a new virtual machine interface to use it.

This time, I’m going to use a CentOs 6.4 minimal installation image. This one present the advantages of being much smaller than the LiveCd (I don’t need a graphical interface) and no Gnome network manager which we need to disable.

So let’s start by creating the new network connection. Go to the VirtualBox preferences, in the Network tab and click the plus icon. Note, you can choose your IPV4 router address, very nice. Also, uncheck the DHCP server on the other tab, we wont need it.

Now, we will create the new virtual machine. Click new in the main window manager, choose a name, “Hadoop1” for me, a type, “Linux”, and a version, “RedHat (64 bit)” for CentOs. My laptop is 16Gb of Ram and 512 Gb of SSD disk. In the next sections, I allocate 4Gb of RAM and 8 GB of dynamically allocated space hard drive.

Once you are back in the window manager, select the newl virtual machine and click on the “Settings” icon. In the “Netwok” tab, leave “Adapter 1” as is and activate “Adapter 2” with “Attached to” set to “Host-only Adapter” and Name to your previously created connection, “vboxnet0” for me.

We are ready start the newly created virtual machine. Select your downloaded CentOs image and complete each an every screens. I choose “hadoop1” as a hostname. The installation process took less than 5 minutes for me.

Once complete, you are presented with a prompt console. You can login with the root account and the password entered during the setup process. From the “/etc/sysconfig/network-scripts” directory, we will update the shared interface for Internet access and the host-only interface for local networking. Here’s the code:

cat > /etc/sysconfig/network-scripts/ifcfg-eth0 <<IF
cat > /etc/sysconfig/network-scripts/ifcfg-eth1 <<IF
service network restart

The DHCP interface is already configured and almost fine by default. Only it need to be activated on startup. The Host-only networking configuration must be set with a static IP.

The system is now ready. We will only finish by upgrading it and installing some main components along the VirtualBox Guest Additions. This step is optional. I am personaly wondering what functionalities it can bring to a non graphical installation. Before mounting the cdrom, don’t forget to click on “Install Guest Additions” in the “Devices” section of your virtual machine menu.

yum -y update
yum -y groupinstall "Development Tools"
yum -y install kernel-devel-$(uname -r)
mount /dev/cdrom /mnt

Note how yum groupinstall "Development Tools" is the equivalent on Debian and Ubuntu of apt-get install build-essential.

On guest additions installation, there are two errors, “Building the OpenGL support module” and “Installing the Window System drivers” which make sense since we have no windowing environment.

You are now ready to have a pleasant time with Hadoop. But this is the subject of another post.

Side note

If you lose all your network interface after cloning and updating your network mac address, you may try to comment all the lines in “/etc/udev/rules.d/70-persistent-net.rules” and reboot.