Hadoop and HBase installation on OSX in pseudo-distributed mode

Hadoop and HBase installation on OSX in pseudo-distributed mode

By David WORMS

Dec 1, 2010

The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a few packages are installed by Macport but these are easily found on equivalent tools like Apt and Yum. Since the downloaded software is in Java, there should be no worries about how it works in other environments.

This environment is configured in pseudo-distributed mode to best simulate the behavior of a cluster if a single station. In this mode, each Java process runs in its own JVM.

The procedure covers the installation of the following software:

Choice of versions

The software installation from the SVN repositories faced a problem of incompatibility between Hive which requires the latest stable version of Hadoop (2.20.2) and that of Sqoop which requires the SVN version of Hadoop. For this reason, we opted for versions distributed by Cloudera. Based on stable versions, they include many of the patches present in SVN repositories and are tested by some of the best experts in the community.

However, some features are not yet present at the time of distribution, so some of us also use versions compiled from SVN repositories. The software in question is HBase and Hive and their manual installation is not covered below.

Installation

The described procedure is based on the assumption that XCode and MacPort are already present on the system.

The distribution of Cloudera is CDH3beta2 which is not the most recent but the mechanism is the same provided you go to the Cloudera website and download the latest versions. MacPort Dependencies

sudo port install wget p26-virtualenv libxml2 libxslt mysql5 mysql5-server sqlite mysql
easyinstall simplejson

Setting up SSH

# You need to be able to pay with a password
# (using your public key)
# Next line only if "~ / .ssh / id_rsa.pub" does not exist
ssh-keygen -C "my@email.com" -t rsa
cat ~ / .ssh / id_rsa.pub >> ~ / .ssh / authorized_keys
chmod 600 ~ / .ssh / authorized_keys
# you should now be able to issue "ssh localhost"
# without entering your password

Preparing the installation directory

# Replace {username}
sudo mkdir / opt / cloudera
sudo chown {username}: staff / opt / cloudera
cd / opt / cloudera

Downloading packages

wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.tar.gz
wget http://archive.cloudera.com/cdh/3/hive-0.5.0+20.tar.gz
wget http://archive.cloudera.com/cdh/3/pig-0.7.0+9.tar.gz
wget http://archive.cloudera.com/cdh/3/hbase-0.89.20100621+17.tar.gz
wget http://archive.cloudera.com/cdh/3/sqoop-1.0.0+3.tar.gz
wget http://archive.cloudera.com/cdh/3/flume-0.9.0+1.tar.gz
wget http://archive.cloudera.com/cdh/3/hue-common_0.9-2.tar.gz
wget http://archive.cloudera.com/cdh/3/zookeeper-3.3.1+10.tar.gz

Extracting packages

tar -xzf hadoop-0.20.2 + 320.tar.gz
tar -xzf hive-0.5.0 + 20.tar.gz
tar -xzf pig-0.7.0 + 9.tar.gz
tar -xzf hbase-0.89.20100621 + 17.tar.gz
tar -xzf sqoop-1.0.0 + 3.tar.gz
tar -xzf flume-0.9.0 + 1.tar.gz
tar -xzf hue-common_0.9-2.tar.gz
tar -xzf zookeeper-3.3.1 + 10.tar.gz
# Make symbolic links
ln -sf hadoop-0.20.2 + 320 hadoop
ln -sf hive-0.7.0-dev hive
ln -sf pig-0.7.0 + 9 pig
ln -sf hbase-0.89.20100621 + 17 hbase
ln -sf sqoop-1.0.0 + 3 sqoop
ln -sf flume-0.9.0 + 1 flume
ln -sf hue-common-0.9 hue
ln -sf zookeeper-3.3.1 + 10 zookeeper
# Clean up
rm -rf * .tar.gz

Setting up the environment

echo "export JAVA_HOME = / Library / Java / Home" >> ~ / .profile
echo "export HADOOP_HOME = / opt / cloudera / hadoop" >> ~ / .profile
echo "export HIVE_HOME = / opt / cloudera / hive" >> ~ / .profile
echo "export HIVE_CONF_DIR = / opt / cloudera / hive / conf" >> ~ / .profile
echo "export PIG_HOME = / opt / cloudera / pig" >> ~ / .profile
echo "export HBASE_HOME = / opt / cloudera / hbase" >> ~ / .profile
echo "export FLUME_HOME = / opt / cloudera / flume" >> ~ / .profile
echo "export SQOOP_HOME = / opt / cloudera / sqoop" >> ~ / .profile
echo "export ZOOKEEPER_HOME = / opt / cloudera / zookeeper" >> ~ / .profile
echo "export HUE_HOME = / opt / cloudera / hue" >> ~ / .profile
echo "export PATH = \ $ HADOOP_HOME / bin: \ $ PATH" >> ~ / .profile
echo "PATH export = \ $ HBASE_HOME / bin: \ $ PATH" >> ~ / .profile
echo "PATH export = \ $ PIG_HOME / bin: \ $ PATH" >> ~ / .profile
echo "export PATH = \ $ HIVE_HOME / bin: \ $ PATH" >> ~ / .profile
echo "PATH export = \ $ FLUME_HOME / bin: \ $ PATH" >> ~ / .profile
echo "PATH export = \ $ SQOOP_HOME / bin: \ $ PATH" >> ~ / .profile
echo "export PATH = \ $ ZOOKEEPER_HOME / bin: \ $ PATH" >> ~ / .profile
source ~ / .profile

Software configuration

# Configure Hive
# Edit ./conf/hive-default.xml
# and modify the property 'javax.jdo.option.ConnectionURL'
# to 'jdbc: derby:; databaseName = / opt / cloudera / data / hive / metastore_db; create = true'
# Edit ./conf/hive-log4j.properties
# and modify the property 'hive.log.file'
# to '/opt/cloudera/data/hive/hive.log'
# (SVN only) modify property 'hive.stats.dbconnectionstring'
# to 'jdbc: derby:; databaseName = / opt / cloudera / data / hive_temp_stats; create = true'
# For mysql,
# modify 'javax.jdo.option.ConnectionURL'
# to 'jdbc: mysql: //127.0.0.1/hive? createDatabaseIfNotExist = true'
# and 'javax.jdo.option.ConnectionDriverName'
# to 'com.mysql.jdbc.Driver'
# and 'javax.jdo.option.ConnectionUserName'
# to 'your_username'
# and 'javax.jdo.option.ConnectionPassword'
# to 'your_password'

# Configure Hue
# Note: from memory, you might also need mysql
sudo port install py26-setuptools
ln -sf / opt / local / bin / mysql_config5 / opt / local / bin / mysql_config
cd hue-common-0.9
make apps
# EDIT $ HUE_HOME / desktop / conf / hue.ini and
# update the property "hadoop_home" from "/usr/lib/hadoop-0.20"
# to "/opt/cloudera/hadoop-0.20.2+320"

# Configure Hadoop
cd $ HADOOP_HOME / lib
ln -s $ HUE_HOME / desktop / libs / hadoop / java-lib / hue-plugins-0.9.jar
mv hadoop-0.20.2 + 320 / conf hadoop-0.20.2 + 320 / conf.bck
cp -rp hue / example-hadoop-confs / conf.pseudo-hue hadoop-0.20.2 + 320 / conf
# Edit the file $ HADOOP_HOME / conf / core-site.xml and replace
# the value "/var/lib/hadoop-0.20/cache/${user.name}" by
# "/opt/cloudera/data/cache/${user.name}"
# Edit the file $ HADOOP_HOME / conf / hdfs-site.xml and replace
# the value "/var/lib/hadoop-0.20/cache/hadoop/dfs/name" by
# "/opt/cloudera/data/cache/${user.name}/dfs/name"
hadoop namenode -format

Use

Starting services

# Start Hadoop
start-all.sh
# Start HBase
start-hbase.sh
hbase-daemon.sh start rest
# Start Hue
cd $ HUE_HOME
./build/env/bin/supervisor

Stop services

# Stop Hadoop
stop-all.sh
# Stop HBase
hbase-daemon.sh stop rest
stop-hbase.sh

Administration

If the installation went smoothly, the following URLs should be available:

  • Hadoop Map / Reduce Administration: http://localhost:50030
  • Hadoop File System Browser: http://localhost:50070
  • Hadoop Task Tracker Status: http://localhost:50060
  • Hue: http://localhost:8088

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.