Oracle to Apache Hive with the Oracle SQL Connector
By David WORMS
May 27, 2013
- Categories
- Business Intelligence
- Tags
- Oracle
- HDFS
- Hive
- Network
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
In a previous article published last week, I introduced the choices available to connect Oracle and Hadoop. In a follow up article, I covered the Oracle SQL Connector, its installation and integration with Apache Hadoop and more specifically how to declare a file present inside HDFS, the Hadoop filesystem, as a database table inside the Oracle database.
Below I will complement the integration between Oracle and Hadoop with the integration of the Apache Hive data warehouse system.
Where to install the connector
The official documentation state:
To provide support for Hive tables, install Oracle SQL Connector for HDFS on the Hadoop cluster. If you only plan to access HDFS files, then you can omit this procedure.
According to my tests, this is wrong. Just like for the HDFS integration, it all depends where you whish to execute the orahdfs
commands (createTable
, publish
, …). Install the Oracle SQL Connector wherever it best suits your needs. Install it only your Oracle server is easier, there is only one deployment of the connector to manage. However, you will have to open your Hive metastore database access.
Installation
The deployment takes over the one for HDFS with a few additional steps. Please refer to my previous article covering the deployment of the Oracle connector for Hadoop HDFS.
If you are installing on the Oracle server, the Hive client shall already be installed but you must import the Hive configuration. Don’t forget to update your database hostname if you are using “localhost”. Also be careful, your database privileges may not apply with this new host. When using mysql, try connecting with mysql -hmyhost -umyuser -pmypassword
.
node=a_hadoop_hostname
scp root@${node}:/etc/hive/conf/hive-site.xml /etc/hive/conf/hive-site.xml
sed -i.back s/localhost/$node/ /etc/hive/conf/hive-site.xml
The mysql driver for java should be present on your cluster and if it is not already on the Oracle server, you can run yum install mysql-connector-java
.
Finally, we must add the environmental variables, for example inside your ~/.bash_profile
.
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:/usr/lib/hive/lib/*"
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:/etc/hive/conf"
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:/usr/share/java/mysql-connector-java.jar"
Creating an Oracle table referencing a Hive table
The procedure to declare a Hive table is almost the same as for a HDFS file. The differences reside inside the XML definition file. All the oracle.hadoop.exttab.*
properties are updated to reflect Hive specific information.
sqlplus "/ as sysdba"
...
SQL> CREATE OR REPLACE DIRECTORY ext_hive_test_dir AS '/home/oracle/ext_hive_test_dir';
SQL> GRANT READ, WRITE ON DIRECTORY ext_hive_test_dir TO big;
hadoop jar \
$OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable \
-conf ./hive.xml -createTable
Where the content of the hive.xml
file is:
<?xml version="1.0"?>
<configuration>
<property>
<name>oracle.hadoop.connection.url</name>
<value>jdbc:oracle:thin:@//100.100.100.52:1521/BIG</value>
</property>
<property>
<name>oracle.hadoop.connection.user</name>
<value>BIG</value>
</property>
<property>
<name>oracle.hadoop.exttab.sourceType</name>
<value>hive</value>
</property>
<property>
<name>oracle.hadoop.exttab.tableName</name>
<value>BIG.EXT_HIVE_TEST</value>
</property>
<property>
<name>oracle.hadoop.exttab.defaultDirectory</name>
<value>ext_hive_test_dir</value>
</property>
<property>
<name>oracle.hadoop.exttab.hive.databaseName</name>
<value>test_oracle_db</value>
</property>
<property>
<name>oracle.hadoop.exttab.hive.tableName</name>
<value>test_oracle_table</value>
</property>
</configuration>
Done. You can now test the connection by executing some SQL queries.
SQL> SELECT COUNT(*) FROM big.ext_hive_test;