Oracle DB synchrnozation to Hadoop with CDC

Oracle DB synchrnozation to Hadoop with CDC

By David WORMS

Jul 31, 2017

This note is the result of a discussion about the synchronization of data written in a database to a warehouse stored in Hadoop. Thanks to Claude Daub from GFI who wrote it and who authorizes us to publish it.

Oracle GoldenGate

  • Real-time data replication tool based on internal logs
  • Distributed and therefore officially supported by Oracle
  • No impact on the performance of the source database
  • Wide range of destinations: HDFS, Kafka, HBase, Hive, Flumes, JDBC, …

Oracle GoldenGate

Continuent Tungsten Replicator

Tungsten connect to Oracle CDC (Change Data Capture) which retrieves the changes from the redo logs. It produces other tables with the changes.

This solution allows deferred processing, but uses, as with triggers, intermediate tables. The sync method with Hadoop is adaptable (Sqoop can suffice) or a CSV export / import via file.

It integrates a data replication service:

  • Compatible with several databases (Oracle, MySQL …)
  • We can have time series
  • Hortonworks Certified Solution
  • No information on the impact on performance on the source

Continuent Tungsten Replicator

DBVisit

Commercial solution with:

  • Real time
  • Support of several RDBMS (Oracle, MySQL)
  • Bidirectional replication possible.

Share Plex of Quest Software

Share Plex of Quest Software

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.