Articles published in 2023

Installation Guide to TDP, the 100% open source big data platform

Installation Guide to TDP, the 100% open source big data platform

Categories: Big Data, Infrastructure | Tags: Infrastructure, VirtualBox, Hadoop, Vagrant, TDP

The Trunk Data Platform (TDP) is a 100% open source big data distribution, based on Apache Hadoop and compatible with HDP 3.1. Initiated in 2021 by EDF, the DGFiP and Adaltas, the project is governed…

Paul FARAULT

By Paul FARAULT

Oct 18, 2023

New TDP website launched

New TDP website launched

Categories: Big Data | Tags: Programming, Ansible, Hadoop, Python, TDP

The new TDP (Trunk Data Platform) website is online. We invite you to browse its pages to discover the platform, stay informed, and cultivate contact with the TDP community. TDP is a completely open…

David WORMS

By David WORMS

Oct 3, 2023

CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP

CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP

Categories: Big Data, Data Engineering, Learning | Tags: NiFi, Business intelligence, Data Engineering, EC2, Hive, Iceberg, Ranger, Spark, Amazon S3, Big Data, Cloud, Cloudera, CDP, Data Analytics, Data Lake, Data Warehouse

In this hands-on lab session we demonstrate how to build an end-to-end big data solution with Cloudera Data Platform (CDP) Public Cloud, using the infrastructure we have deployed and configured over…

Tobias CHAVARRIA

By Tobias CHAVARRIA

Jul 24, 2023

CDP part 5: user permissions management on CDP Public Cloud

CDP part 5: user permissions management on CDP Public Cloud

Categories: Big Data, Cloud Computing, Data Governance | Tags: Ranger, Cloudera, CDP, Data Warehouse

When you create a user or a group in CDP, it requires permissions to access resources and use the Data Services. This article is the fifth in a series of six: CDP part 1: introduction to end-to-end…

Tobias CHAVARRIA

By Tobias CHAVARRIA

Jul 18, 2023

CDP part 4: user management on CDP Public Cloud with Keycloak

CDP part 4: user management on CDP Public Cloud with Keycloak

Categories: Big Data, Cloud Computing, Data Governance | Tags: EC2, Big Data, CDP, Docker Compose, Keycloak, SSO

Previous articles of the serie cover the deployment of a CDP Public Cloud environment. All the components are ready for use and it is time to make the environment available to other users to explore…

Tobias CHAVARRIA

By Tobias CHAVARRIA

Jul 4, 2023

CDP part 3: Data Services activation on CDP Public Cloud environment

CDP part 3: Data Services activation on CDP Public Cloud environment

Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloudera, CDP

One of the big selling points of Cloudera Data Platform (CDP) is their mature managed service offering. These are easy to deploy on-premises, in the public cloud or as part of a hybrid solution. The…

Albert KONRAD

By Albert KONRAD

Jun 27, 2023

CDP part 2: CDP Public Cloud deployment on AWS

CDP part 2: CDP Public Cloud deployment on AWS

Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloud, Cloudera, CDP, Cloudera Manager

The Cloudera Data Platform (CDP) Public Cloud provides the foundation upon which full featured data lakes are created. In a previous article, we introduced the CDP platform. This article is the second…

Albert KONRAD

By Albert KONRAD

Jun 19, 2023

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: CLI, Hue, Data Engineering, Hortonworks, Container Orchestration, EC2, Iceberg, AWS, Amazon S3, Azure, Big Data, Cloud, Cloudera, CDP, Cloudera Manager, Data Analytics, Data Warehouse, Deployment, Keycloak

Cloudera Data Platform (CDP) is a hybrid data platform for big data transformation, machine learning and data analytics. In this series we describe how to build and use an end-to-end big data…

Stephan BAUM

By Stephan BAUM

Jun 8, 2023

Local development environments with Terraform + LXD

Local development environments with Terraform + LXD

Categories: Containers Orchestration, DevOps & SRE | Tags: Automation, DevOps, KVM, LXD, Virtualization, VM, Terraform, Vagrant

As a Big Data Solutions Architect and InfraOps, I need development environments to install and test software. They have to be configurable, flexible, and performant. Working with distributed systems…

Gauthier LEONARD

By Gauthier LEONARD

Jun 1, 2023

Data platform requirements and expectations

Data platform requirements and expectations

Categories: Big Data, Infrastructure | Tags: Data Engineering, Data Governance, Iceberg, AWS, Azure, Cloudera, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Science, Databricks, File Format, GCP

A big data platform is a complex and sophisticated system that enables organizations to store, process, and analyze large volumes of data from a variety of sources. It is composed of several…

David WORMS

By David WORMS

Mar 23, 2023

Keycloak deployment in EC2

Keycloak deployment in EC2

Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Security, SSH, EC2, Authentication, AWS, Cloudera, CDP, Docker, Keycloak, SSL/TLS, SSO

Why use Keycloak Keycloak is an open-source identity provider (IdP) using single sign-on (SSO). An IdP is a tool to create, maintain, and manage identity information for principals and to provide…

Stephan BAUM

By Stephan BAUM

Mar 14, 2023

Operating Kafka in Kubernetes with Strimzi

Operating Kafka in Kubernetes with Strimzi

Categories: Big Data, Containers Orchestration, Infrastructure | Tags: Kafka, Big Data, Kubernetes, Open source, Streaming

Kubernetes is not the first platform that comes to mind to run Apache Kafka clusters. Indeed, Kafka’s strong dependency on storage might be a pain point regarding Kubernetes’ way of doing things when…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Mar 7, 2023

Kubernetes: debugging with ephemeral containers

Kubernetes: debugging with ephemeral containers

Categories: Containers Orchestration, Tech Radar | Tags: cgroups, Debug, Infrastructure, Linux, Docker, Kubernetes, PostgreSQL

Anyone who has ever had to manipulate Kubernetes has found himself confronted with the resolution of pod errors. The methods provided for this purpose are efficient, and allow to overcome the most…

Pierre BERLAND

By Pierre BERLAND

Feb 7, 2023

Dive into tdp-lib, the SDK in charge of TDP cluster management

Dive into tdp-lib, the SDK in charge of TDP cluster management

Categories: Big Data, Infrastructure | Tags: Programming, Ansible, Hadoop, Python, TDP

All the deployments are automated and Ansible plays a central role. With the growing complexity of the code base, a new system was needed to overcome the Ansible limitations which will enable us to…

Guillaume BOUTRY

By Guillaume BOUTRY

Jan 24, 2023

Adaltas Summit 2022 Morzine

Adaltas Summit 2022 Morzine

Categories: Big Data, Adaltas Summit 2022 | Tags: Data Engineering, Infrastructure, Iceberg, Container, Data lakehouse, Docker, Kubernetes

For its third edition, the whole Adaltas crew is gathering in Morzine for a whole week with 2 days dedicated to technology the 15th and the 16Th of september 2022. The speakers choose one of the…

David WORMS

By David WORMS

Jan 13, 2023

How to build your OCI images using Buildpacks

How to build your OCI images using Buildpacks

Categories: Containers Orchestration, DevOps & SRE | Tags: CI/CD, CNCF, Docker, Kubernetes, OCI

Docker has become the new standard for building your application. In a Docker image we place our source code, its dependencies, some configurations and our application is almost ready to be deployed…

Introduction to OpenLineage

Introduction to OpenLineage

Categories: Big Data, Data Governance, Infrastructure | Tags: Data Engineering, Infrastructure, Atlas, Data Lake, Data lakehouse, Data Warehouse, Data lineage

OpenLineage is an open-source specification for data lineage. The specification is complemented by Marquez, its reference implementation. Since its launch in late 2020, OpenLineage has been a presence…

Christophe PARREIRA

By Christophe PARREIRA

Dec 19, 2023

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain