Installation Guide to TDP, the 100% open source big data platform
Oct 18, 2023
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
The Trunk Data Platform (TDP) is a 100% open source big data distribution, based on Apache Hadoop and compatible with HDP 3.1. Initiated in 2021 by EDF, the DGFiP and Adaltas, the project is governed by the TOSIT - an association under the 1901 law with the objective of promoting open source to major companies and institutions.
Version 1.1, which release is expected duing the 4th quarter of 2023, adds features necessary for managing a production cluster (see #308). Support and training offers are already available from some consulting firms like Adaltas with Alliage.
TDP is aimed at anyone wishing to:
- Create their data platform (Data Lake, Data Hub, Data Warehouse, Data Science Platform, etc.).
- Migrate their current solution to a 100% open source (and free) solution.
- Develop on big data services (HDFS, Hive, Spark, etc.).
- Explore Hadoop technologies.
TDP can be broken down into 2 main parts:
- A stack, based on Apache Hadoop and compatible with HDP 3.1.
- A cluster manager, based on Ansible, that allows deploying and managing a TDP cluster via a library, a REST API, or a graphical interface (see
The project was designed in a modular way. This is true for both the stack and the manager. It is thus possible to add components, to not use the UI, etc.
Adaltas, through its Alliage offer, provides support and expertise on TDP. On its website, you will find the publication of a guide that allows you to deploy a TDP cluster locally, using Vagrant and VirtualBox. Its purpose is to discover the platform’s functionalities.
This guide provides a development environment. It does not apply to production deployments, the documentation for which is currently being written, see PR #88.
Build the data platform that suits you
Adaltas is a consulting company specialized in big data and open source technologies. We are partners with Cloudera, Dremio, and Databricks. Our clients trust our consultants to contribute to the development of TDP.
We will thus be able to assist you in setting up your data platform, from design to production. Do not hesitate to contact us for more information.