MariaDB integration with Hadoop
By David WORMS
Jul 31, 2017
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
During a workshop with one of our customers, Adaltas has identified a potential risk to use MariaDB’s High Availability (HA) strategy.
Since the customer selected Cloudera’s CDH 5 distribution, the reasoning below is based on Cloudera’s official documentation. However, it applies to all Hadoop distributions including Hortonworks.
Cloudera lists the various databases supported in HA on its website:
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/admin_cm_ha_dbms.html
and, in the case of MariaDB, redirects the user to the replication documentation:
https://mariadb.com/kb/en/mariadb/setting-up-replication/
The last documentation reflects the old replication strategy. Recently, version 10.0 of MariaDB introduces a replication strategy based on global transaction IDs (GTIDs):
https://mariadb.com/kb/en/global-transaction-id/
One hypothesis is that Cloudera’s documentation does not reflect the latest developments in MariaDB. However, Cloudera explicitly states that the GTID replication mode is not supported in the case of Mysql: “Cloudera Manager installation fails if GTID-based replication is enabled in MySQL”. In return, the documentation of MariaDB specifies that their implementation of GTID is not identical to that of MySQL: “Note that MariaDB and MySQL have different GTID implementations, and that these are not compatible with each other”.
There remains a doubt as to the compatibility of the components deployed by Cloudera to MariaDB configured with GTID.
On the community and documentation site of Hortonworks, we could not identify any additional information. Hortonworks confirms that it does not support GTID in MySQL without mentioning MariaDB or providing more information:
https://community.hortonworks.com/questions/2172/hive-metastore-ha-mysql-replication-for-failover-p.html
Possible actions:
- Set up an installation to validate the integration, subject to validation, there will remain a doubt as to stability in operation;
- Bring the question back to support Cloudera in order to obtain an official response.