In a previous article published last week, I introduced the choices available to connect Oracle and Hadoop. In a follow up article, I covered the Oracle SQL Connector, its installation and integration with Apache Hadoop and more specifically how to declare a file present inside HDFS, the Hadoop filesystem, as a database table inside the Oracle database.
Where to install the connector
The official documentation state:
According to my tests, this is wrong. Just like for the HDFS integration, it all depends where you whish to execute the
orahdfs commands (“createTable”, “publish”, …). Install the Oracle SQL Connector wherever it best suits your needs. Install it only your Oracle server is easier, there is only one deployment of the connector to manage. However, you will have to open your Hive metastore database access.
If you are installing on the Oracle server, the Hive client shall already be installed but you must import the Hive configuration. Don’t forget to update your database hostname if you are using “localhost”. Also be careful, your database privileges may not apply with this new host. When using mysql, try connecting with
mysql -hmyhost -umyuser -pmypassword.
1 2 3
The mysql driver for java should be present on your cluster and if it is not already on the Oracle server, you can run
yum install mysql-connector-java.
Finally, we must add the environmental variables, for example inside your
1 2 3
Creating an Oracle table referencing a Hive table
The procedure to declare a Hive table is almost the same as for a HDFS file. The differences reside inside the XML definition file. All the “oracle.hadoop.exttab.*” properties are updated to reflect Hive specific information.
1 2 3 4
1 2 3
Where the content of the “hive.xml” file is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Done. You can now test the connection by executing some SQL queries.