Hadoop and HBase installation on OSX in pseudo-distributed mode
By David WORMS
Dec 1, 2010
- Categories
- Big Data
- Learning
- Tags
- Hue
- Infrastructure
- Hadoop
- HBase
- Big Data
- Deployment [more][less]
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a few packages are installed by Macport but these are easily found on equivalent tools like Apt and Yum. Since the downloaded software is in Java, there should be no worries about how it works in other environments.
This environment is configured in pseudo-distributed mode to best simulate the behavior of a cluster if a single station. In this mode, each Java process runs in its own JVM.
The procedure covers the installation of the following software:
Choice of versions
The software installation from the SVN repositories faced a problem of incompatibility between Hive which requires the latest stable version of Hadoop (2.20.2) and that of Sqoop which requires the SVN version of Hadoop. For this reason, we opted for versions distributed by Cloudera. Based on stable versions, they include many of the patches present in SVN repositories and are tested by some of the best experts in the community.
However, some features are not yet present at the time of distribution, so some of us also use versions compiled from SVN repositories. The software in question is HBase and Hive and their manual installation is not covered below.
Installation
The described procedure is based on the assumption that XCode and MacPort are already present on the system.
The distribution of Cloudera is CDH3beta2 which is not the most recent but the mechanism is the same provided you go to the Cloudera website and download the latest versions. MacPort Dependencies
sudo port install wget p26-virtualenv libxml2 libxslt mysql5 mysql5-server sqlite mysql
easyinstall simplejson
Setting up SSH
# You need to be able to pay with a password
# (using your public key)
# Next line only if "~ / .ssh / id_rsa.pub" does not exist
ssh-keygen -C "my@email.com" -t rsa
cat ~ / .ssh / id_rsa.pub >> ~ / .ssh / authorized_keys
chmod 600 ~ / .ssh / authorized_keys
# you should now be able to issue "ssh localhost"
# without entering your password
Preparing the installation directory
# Replace {username}
sudo mkdir / opt / cloudera
sudo chown {username}: staff / opt / cloudera
cd / opt / cloudera
Downloading packages
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.tar.gz
wget http://archive.cloudera.com/cdh/3/hive-0.5.0+20.tar.gz
wget http://archive.cloudera.com/cdh/3/pig-0.7.0+9.tar.gz
wget http://archive.cloudera.com/cdh/3/hbase-0.89.20100621+17.tar.gz
wget http://archive.cloudera.com/cdh/3/sqoop-1.0.0+3.tar.gz
wget http://archive.cloudera.com/cdh/3/flume-0.9.0+1.tar.gz
wget http://archive.cloudera.com/cdh/3/hue-common_0.9-2.tar.gz
wget http://archive.cloudera.com/cdh/3/zookeeper-3.3.1+10.tar.gz
Extracting packages
tar -xzf hadoop-0.20.2 + 320.tar.gz
tar -xzf hive-0.5.0 + 20.tar.gz
tar -xzf pig-0.7.0 + 9.tar.gz
tar -xzf hbase-0.89.20100621 + 17.tar.gz
tar -xzf sqoop-1.0.0 + 3.tar.gz
tar -xzf flume-0.9.0 + 1.tar.gz
tar -xzf hue-common_0.9-2.tar.gz
tar -xzf zookeeper-3.3.1 + 10.tar.gz
# Make symbolic links
ln -sf hadoop-0.20.2 + 320 hadoop
ln -sf hive-0.7.0-dev hive
ln -sf pig-0.7.0 + 9 pig
ln -sf hbase-0.89.20100621 + 17 hbase
ln -sf sqoop-1.0.0 + 3 sqoop
ln -sf flume-0.9.0 + 1 flume
ln -sf hue-common-0.9 hue
ln -sf zookeeper-3.3.1 + 10 zookeeper
# Clean up
rm -rf * .tar.gz
Setting up the environment
echo "export JAVA_HOME = / Library / Java / Home" >> ~ / .profile
echo "export HADOOP_HOME = / opt / cloudera / hadoop" >> ~ / .profile
echo "export HIVE_HOME = / opt / cloudera / hive" >> ~ / .profile
echo "export HIVE_CONF_DIR = / opt / cloudera / hive / conf" >> ~ / .profile
echo "export PIG_HOME = / opt / cloudera / pig" >> ~ / .profile
echo "export HBASE_HOME = / opt / cloudera / hbase" >> ~ / .profile
echo "export FLUME_HOME = / opt / cloudera / flume" >> ~ / .profile
echo "export SQOOP_HOME = / opt / cloudera / sqoop" >> ~ / .profile
echo "export ZOOKEEPER_HOME = / opt / cloudera / zookeeper" >> ~ / .profile
echo "export HUE_HOME = / opt / cloudera / hue" >> ~ / .profile
echo "export PATH = \ $ HADOOP_HOME / bin: \ $ PATH" >> ~ / .profile
echo "PATH export = \ $ HBASE_HOME / bin: \ $ PATH" >> ~ / .profile
echo "PATH export = \ $ PIG_HOME / bin: \ $ PATH" >> ~ / .profile
echo "export PATH = \ $ HIVE_HOME / bin: \ $ PATH" >> ~ / .profile
echo "PATH export = \ $ FLUME_HOME / bin: \ $ PATH" >> ~ / .profile
echo "PATH export = \ $ SQOOP_HOME / bin: \ $ PATH" >> ~ / .profile
echo "export PATH = \ $ ZOOKEEPER_HOME / bin: \ $ PATH" >> ~ / .profile
source ~ / .profile
Software configuration
# Configure Hive
# Edit ./conf/hive-default.xml
# and modify the property 'javax.jdo.option.ConnectionURL'
# to 'jdbc: derby:; databaseName = / opt / cloudera / data / hive / metastore_db; create = true'
# Edit ./conf/hive-log4j.properties
# and modify the property 'hive.log.file'
# to '/opt/cloudera/data/hive/hive.log'
# (SVN only) modify property 'hive.stats.dbconnectionstring'
# to 'jdbc: derby:; databaseName = / opt / cloudera / data / hive_temp_stats; create = true'
# For mysql,
# modify 'javax.jdo.option.ConnectionURL'
# to 'jdbc: mysql: //127.0.0.1/hive? createDatabaseIfNotExist = true'
# and 'javax.jdo.option.ConnectionDriverName'
# to 'com.mysql.jdbc.Driver'
# and 'javax.jdo.option.ConnectionUserName'
# to 'your_username'
# and 'javax.jdo.option.ConnectionPassword'
# to 'your_password'
# Configure Hue
# Note: from memory, you might also need mysql
sudo port install py26-setuptools
ln -sf / opt / local / bin / mysql_config5 / opt / local / bin / mysql_config
cd hue-common-0.9
make apps
# EDIT $ HUE_HOME / desktop / conf / hue.ini and
# update the property "hadoop_home" from "/usr/lib/hadoop-0.20"
# to "/opt/cloudera/hadoop-0.20.2+320"
# Configure Hadoop
cd $ HADOOP_HOME / lib
ln -s $ HUE_HOME / desktop / libs / hadoop / java-lib / hue-plugins-0.9.jar
mv hadoop-0.20.2 + 320 / conf hadoop-0.20.2 + 320 / conf.bck
cp -rp hue / example-hadoop-confs / conf.pseudo-hue hadoop-0.20.2 + 320 / conf
# Edit the file $ HADOOP_HOME / conf / core-site.xml and replace
# the value "/var/lib/hadoop-0.20/cache/${user.name}" by
# "/opt/cloudera/data/cache/${user.name}"
# Edit the file $ HADOOP_HOME / conf / hdfs-site.xml and replace
# the value "/var/lib/hadoop-0.20/cache/hadoop/dfs/name" by
# "/opt/cloudera/data/cache/${user.name}/dfs/name"
hadoop namenode -format
Use
Starting services
# Start Hadoop
start-all.sh
# Start HBase
start-hbase.sh
hbase-daemon.sh start rest
# Start Hue
cd $ HUE_HOME
./build/env/bin/supervisor
Stop services
# Stop Hadoop
stop-all.sh
# Stop HBase
hbase-daemon.sh stop rest
stop-hbase.sh
Administration
If the installation went smoothly, the following URLs should be available:
- Hadoop Map / Reduce Administration:
http://localhost:50030
- Hadoop File System Browser:
http://localhost:50070
- Hadoop Task Tracker Status:
http://localhost:50060
- Hue:
http://localhost:8088