Infrastructure

Parce qu’elle est vitale dans les projets Big Data, nous vous aidons à développer et à mettre en œuvre l’infrastructure de données appropriée et compatible avec votre environnement informatique existant.

Nos compétences couvrent les étapes clés du design et de l’architecture tels que le réseau, la surveillance, le diagnostic et le reporting, en passant par le déploiement automatisé, la configuration et la sécurisation. Notre expertise s’étend sur une multitude de technologies et de distributions Big Data.

Nous avons à de multiples reprises sécurisé avec Kerberos les distributions d’Hortonworks, de Cloudera et de MapR et avons l’expérience de mener des ateliers réunissant plusieurs acteurs de votre organisation en vue d’intégrer les plateformes Big Data avec des technologies comme SSL, l’Active Directory, FreeIPA, MIT Kerberos et OpenLDAP.

CloudOn-PremiseExecute workloads across heterogeneous operating systems, plateforms and cloud providersVirtualContainerVirtualBare MetalCompatible with virtualization hypervisors and container schedulers

Articles associés à l'infrastructure

Introduction to OpenLineage

Introduction to OpenLineage

Catégories : Big Data, Data Governance, Infrastructure | Tags : Data Engineering, Infrastructure, Atlas, Data Lake, Data lakehouse, Data Warehouse, Data lineage

OpenLineage is an open-source specification for data lineage. The specification is complemented by Marquez, its reference implementation. Since its launch in late 2020, OpenLineage has been a presence…

Christophe PARREIRA

Par Christophe PARREIRA

19 déc. 2023

Installation Guide to TDP, the 100% open source big data platform

Installation Guide to TDP, the 100% open source big data platform

Catégories : Big Data, Infrastructure | Tags : Infrastructure, VirtualBox, Hadoop, Vagrant, TDP

The Trunk Data Platform (TDP) is a 100% open source big data distribution, based on Apache Hadoop and compatible with HDP 3.1. Initiated in 2021 by EDF, the DGFiP and Adaltas, the project is governed…

Paul FARAULT

Par Paul FARAULT

18 oct. 2023

CDP part 3: Data Services activation on CDP Public Cloud environment

CDP part 3: Data Services activation on CDP Public Cloud environment

Catégories : Big Data, Cloud Computing, Infrastructure | Tags : Infrastructure, AWS, Big Data, Cloudera, CDP

One of the big selling points of Cloudera Data Platform (CDP) is their mature managed service offering. These are easy to deploy on-premises, in the public cloud or as part of a hybrid solution. The…

Albert KONRAD

Par Albert KONRAD

27 juin 2023

CDP part 2: CDP Public Cloud deployment on AWS

CDP part 2: CDP Public Cloud deployment on AWS

Catégories : Big Data, Cloud Computing, Infrastructure | Tags : Infrastructure, AWS, Big Data, Cloud, Cloudera, CDP, Cloudera Manager

The Cloudera Data Platform (CDP) Public Cloud provides the foundation upon which full featured data lakes are created. In a previous article, we introduced the CDP platform. This article is the second…

Albert KONRAD

Par Albert KONRAD

19 juin 2023

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

Catégories : Cloud Computing, Data Engineering, Infrastructure | Tags : Data Engineering, Hortonworks, Iceberg, AWS, Azure, Big Data, Cloud, Cloudera, CDP, Cloudera Manager, Data Warehouse

Cloudera Data Platform (CDP) is a hybrid data platform for big data transformation, machine learning and data analytics. In this series we describe how to build and use an end-to-end big data…

Stephan BAUM

Par Stephan BAUM

8 juin 2023

Data platform requirements and expectations

Data platform requirements and expectations

Catégories : Big Data, Infrastructure | Tags : Data Engineering, Data Governance, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Science

A big data platform is a complex and sophisticated system that enables organizations to store, process, and analyze large volumes of data from a variety of sources. It is composed of several…

David WORMS

Par David WORMS

23 mars 2023

Keycloak deployment in EC2

Keycloak deployment in EC2

Catégories : Cloud Computing, Data Engineering, Infrastructure | Tags : Security, EC2, Authentication, AWS, Docker, Keycloak, SSL/TLS, SSO

Why use Keycloak Keycloak is an open-source identity provider (IdP) using single sign-on (SSO). An IdP is a tool to create, maintain, and manage identity information for principals and to provide…

Stephan BAUM

Par Stephan BAUM

14 mars 2023

Operating Kafka in Kubernetes with Strimzi

Operating Kafka in Kubernetes with Strimzi

Catégories : Big Data, Containers Orchestration, Infrastructure | Tags : Kafka, Big Data, Kubernetes, Open source, Streaming

Kubernetes is not the first platform that comes to mind to run Apache Kafka clusters. Indeed, Kafka’s strong dependency on storage might be a pain point regarding Kubernetes’ way of doing things when…

Leo SCHOUKROUN

Par Leo SCHOUKROUN

7 mars 2023

Dive into tdp-lib, the SDK in charge of TDP cluster management

Dive into tdp-lib, the SDK in charge of TDP cluster management

Catégories : Big Data, Infrastructure | Tags : Programming, Ansible, Hadoop, Python, TDP

All the deployments are automated and Ansible plays a central role. With the growing complexity of the code base, a new system was needed to overcome the Ansible limitations which will enable us to…

Guillaume BOUTRY

Par Guillaume BOUTRY

24 janv. 2023

Big data infrastructure internship

Big data infrastructure internship

Catégories : Big Data, Data Engineering, DevOps & SRE, Infrastructure | Tags : Infrastructure, Hadoop, Big Data, Cluster, Internship, Kubernetes, TDP

Job description Big Data and distributed computing are at the core of Adaltas. We accompagny our partners in the deployment, maintenance, and optimization of some of the largest clusters in France…

Stephan BAUM

Par Stephan BAUM

2 déc. 2022

Traefik, Docker and dnsmasq to simplify container networking

Traefik, Docker and dnsmasq to simplify container networking

Catégories : Containers Orchestration, Infrastructure, Tech Radar | Tags : DNS, Gatsby, JAMstack, Linux, Docker, Network

Good tech adventures start with some frustration, a need, or a requirement. This is the story of how I simplified the management and access of my local web applications with the help of Traefik and…

David WORMS

Par David WORMS

17 nov. 2022

WasmEdge: WebAssembly runtimes are coming for the edge

WasmEdge: WebAssembly runtimes are coming for the edge

Catégories : Containers Orchestration, Adaltas Summit 2021, Infrastructure, Tech Radar | Tags : JAMstack, Linux, Docker, Rust Lang, WebAssembly

With many security challenges solved by design in its core conception, lots of projects benefit from using WebAssembly. WasmEdge runtime is an efficient Virtual Machine optimized for edge computing…

Guillaume BOUTRY

Par Guillaume BOUTRY

29 sept. 2022

Ingresses and Load Balancers in Kubernetes with MetalLB and nginx-ingress

Ingresses and Load Balancers in Kubernetes with MetalLB and nginx-ingress

Catégories : Containers Orchestration, Infrastructure, Tech Radar | Tags : Ingress, Kubeadm, Cluster, Deployment, Kubernetes

When it comes to exposing services from a Kubernetes cluster and making it accessible from outside the cluster, the recommended option is to use a load-balancer type service to redirect incoming…

Kellian COTTART

Par Kellian COTTART

8 sept. 2022

Spark on Hadoop integration with Jupyter

Spark on Hadoop integration with Jupyter

Catégories : Adaltas Summit 2021, Infrastructure, Tech Radar | Tags : Infrastructure, Jupyter, Spark, YARN, CDP, HDP, Notebook, TDP

For several years, Jupyter notebook has established itself as the notebook solution in the Python universe. Historically, Jupyter is the tool of choice for data scientists who mainly develop in Python…

Aargan COINTEPAS

Par Aargan COINTEPAS

1 sept. 2022

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Catégories : Big Data, DevOps & SRE, Infrastructure | Tags : DevOps, Hortonworks, Ansible, Hadoop, HBase, Knox, Ranger, Spark, Cloudera, CDP, CDH, Open source, TDP

Ever since Cloudera and Hortonworks merged, the choice of commercial Hadoop distributions for on-prem workloads essentially boils down to CDP Private Cloud. CDP can be seen as the “best of both worlds…

Leo SCHOUKROUN

Par Leo SCHOUKROUN

14 avr. 2022

Blockchain 102: Cryptocurrencies, Wallets and DApps

Blockchain 102: Cryptocurrencies, Wallets and DApps

Catégories : Adaltas Summit 2021, Infrastructure | Tags : Cryptography, Infrastructure, Blockchain, Consensus

A lot of people own cryptocurrencies today. But holding some tokens on an exchange does not mean interacting with the blockchain. The assets you trade are only numbers stored inside the exchange’s…

Gauthier LEONARD

Par Gauthier LEONARD

12 avr. 2022

Apache HBase: RegionServers co-location

Apache HBase: RegionServers co-location

Catégories : Big Data, Adaltas Summit 2021, Infrastructure | Tags : Ambari, Database, Infrastructure, Tuning, Hadoop, HBase, Big Data, HDP, Storage

RegionServers are the processes that manage the storage and retrieval of data in Apache HBase, the non-relational column-oriented database in Apache Hadoop. It is through their daemons that any CRUD…

Pierre BERLAND

Par Pierre BERLAND

22 févr. 2022

Reliable and reproducible Linux installation with NixOS

Reliable and reproducible Linux installation with NixOS

Catégories : Infrastructure, Learning | Tags : Linux, Packaging, VM, NixOS, TDP

When using an operating system, upgrading packages or installing new ones are common tasks that introduce the risk of affecting the stability of the system. NixOS is a Linux distribution that ensures…

Florent MOUAFFO

Par Florent MOUAFFO

8 févr. 2022

Nix introduction, main concepts and commands

Nix introduction, main concepts and commands

Catégories : Infrastructure, Learning | Tags : Arch Linux, CentOS, Linux, OS X, Packaging, Ubuntu, NixOS, TDP

Nix is a functional package manager for Linux and other Unix systems, making the management of packages more reliable and easy to reproduce. With a traditional package manager, when updating a package…

Florent MOUAFFO

Par Florent MOUAFFO

1 févr. 2022

Blockchain 101: Blockchains and Consensus Mechanisms

Blockchain 101: Blockchains and Consensus Mechanisms

Catégories : Adaltas Summit 2021, Infrastructure, Learning | Tags : Cryptography, Infrastructure, Blockchain, Consensus

Cryptocurrencies are booming in 2021, with a market cap moving from 750 to more than 3,000 billion dollars. Let’s face it, this is mainly due to speculation. A lot of people involved do not have a…

Gauthier LEONARD

Par Gauthier LEONARD

18 janv. 2022

Internship in Big Data infrastructure with TDP

Internship in Big Data infrastructure with TDP

Catégories : Infrastructure, Learning | Tags : Cyber Security, DevOps, Java, Hadoop, IaC, Internship, TDP

Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…

Daniel HARTY

Par Daniel HARTY

25 oct. 2021

Desacralizing the Linux overlay filesystem in Docker

Desacralizing the Linux overlay filesystem in Docker

Catégories : Containers Orchestration, Infrastructure | Tags : DevOps, File system, Linux, Docker

Overlay filesystems (also called union filesystems) is a fundamental technology in Docker to create images and containers. They allow creating a union of directories to create a filesystem. Multiple…

David WORMS

Par David WORMS

3 juin 2021

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Catégories : Big Data, Infrastructure | Tags : Maven, Hadoop, HBase, Hive, Spark, Git, Release and features, TDP, Unit tests

The Hadoop ecosystem gave birth to many popular projects including HBase, Spark and Hive. While technologies like Kubernetes and S3 compatible object storages are growing in popularity, HDFS and YARN…

Leo SCHOUKROUN

Par Leo SCHOUKROUN

18 déc. 2020

Rebuilding HDP Hive: patch, test and build

Rebuilding HDP Hive: patch, test and build

Catégories : Big Data, Infrastructure | Tags : Maven, Java, Hive, Git, GitHub, Release and features, TDP, Unit tests

The Hortonworks HDP distribution will soon be deprecated in favor of Cloudera’s CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunity…

Leo SCHOUKROUN

Par Leo SCHOUKROUN

6 oct. 2020

Installing Hadoop from source: build, patch and run

Installing Hadoop from source: build, patch and run

Catégories : Big Data, Infrastructure | Tags : Maven, Java, LXD, Hadoop, HDFS, Docker, TDP, Unit tests

Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsights…

Leo SCHOUKROUN

Par Leo SCHOUKROUN

4 août 2020

Logstash pipelines remote configuration and self-indexing

Logstash pipelines remote configuration and self-indexing

Catégories : Data Engineering, Infrastructure | Tags : Docker, Elasticsearch, Kibana, Logstash, Log4j

Logstash is a powerful data collection engine that integrates in the Elastic Stack (Elasticsearch - Logstash - Kibana). The goal of this article is to show you how to deploy a fully managed Logstash…

Paul-Adrien CORDONNIER

Par Paul-Adrien CORDONNIER

13 déc. 2019

Hadoop Ozone part 3: advanced replication strategy with Copyset

Hadoop Ozone part 3: advanced replication strategy with Copyset

Catégories : Infrastructure | Tags : HDFS, Ozone, Cluster, Kubernetes, Node

Hadoop Ozone provide a way of setting a ReplicationType for every write you make on the cluster. Right now is supported HDFS and Ratis but more advanced replication strategies can be achieved. In this…

Paul-Adrien CORDONNIER

Par Paul-Adrien CORDONNIER

3 déc. 2019

Hadoop Ozone part 2: tutorial and getting started of its features

Hadoop Ozone part 2: tutorial and getting started of its features

Catégories : Infrastructure | Tags : CLI, Learning and tutorial, HDFS, Ozone, Amazon S3, Cluster, REST

The releases of Hadoop Ozone come with a handy docker-compose file to try out Ozone. The below instructions provide details on how to use it. You can also use the Katacoda training sandbox which…

Paul-Adrien CORDONNIER

Par Paul-Adrien CORDONNIER

3 déc. 2019

Hadoop Ozone part 1: an introduction of the new filesystem

Hadoop Ozone part 1: an introduction of the new filesystem

Catégories : Infrastructure | Tags : HDFS, Ozone, Cluster, Kubernetes

Hadoop Ozone is an object store for Hadoop. It is designed to scale to billions of objects of varying sizes. It is currently in development. The roadmap is available on the project wiki. This article…

Paul-Adrien CORDONNIER

Par Paul-Adrien CORDONNIER

3 déc. 2019

Multihoming on Hadoop

Multihoming on Hadoop

Catégories : Infrastructure | Tags : Hadoop, HDFS, Kerberos, Network

Multihoming, which means having multiple networks attached to one node, is one of the main components to manage the heterogeneous network usage of an Apache Hadoop cluster. This article is an…

Joris RUMMENS

Par Joris RUMMENS

5 mars 2019

Jumbo, the Hadoop cluster bootstrapper

Jumbo, the Hadoop cluster bootstrapper

Catégories : Infrastructure | Tags : Ambari, Automation, Ansible, Cluster, Vagrant, HDP, REST

Introducing Jumbo, a Hadoop cluster bootstrapper for developers. Jumbo helps you deploy development environments for Big Data technologies. It takes a few minutes to get a custom virtualized Hadoop…

Gauthier LEONARD

Par Gauthier LEONARD

29 nov. 2018

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Catégories : Big Data, Infrastructure | Tags : Slider, Erasure Coding, Rolling Upgrade, HDFS, Spark, YARN, Docker

Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current…

Lucas BAKALIAN

Par Lucas BAKALIAN

25 juil. 2018

A CoreOS development cluster with Vagrant and VirtualBox

A CoreOS development cluster with Vagrant and VirtualBox

Catégories : Hack, Infrastructure | Tags : Arch Linux, CoreOS, Linux, VirtualBox, etcd, Vagrant

Following CoreOS’s instructions on how to set up a development environment in VirtualBox did not work out well for me. Here are the steps I followed to get Container Linux up and running with Vagrant…

Arthur BUSSER

Par Arthur BUSSER

20 juin 2018

Lightweight containerization with Tupperware

Lightweight containerization with Tupperware

Catégories : Containers Orchestration, Open Source Summit Europe 2017, Infrastructure | Tags : Btrfs, LXD, Red Hat, Systemd, Zookeeper, Cloud, Consensus

In this article, I will present lightweight containerization set up by Facebook called Tupperware. What is Tupperware Tupperware is a homemade framework written and used internally at Facebook…

Lucas BAKALIAN

Par Lucas BAKALIAN

3 nov. 2017

Nobody* puts Java in a Container

Nobody* puts Java in a Container

Catégories : Containers Orchestration, Open Source Summit Europe 2017, Infrastructure | Tags : cgroups, Java, JRE, JVM, Namespaces, Docker

This talk was about the issues of putting Java in a container and how, in its latest version, the JDK is now more aware of the container it is running in. The presentation is led by Joerg Schad…

Paul-Adrien CORDONNIER

Par Paul-Adrien CORDONNIER

28 oct. 2017

MariaDB integration with Hadoop

MariaDB integration with Hadoop

Catégories : Infrastructure | Tags : Database, HA, MariaDB, Hadoop, Hive

During a workshop with one of our customers, Adaltas has identified a potential risk to use MariaDB’s High Availability (HA) strategy. Since the customer selected Cloudera’s CDH 5 distribution, the…

David WORMS

Par David WORMS

31 juil. 2017

Exposing Kafka on two different networks

Exposing Kafka on two different networks

Catégories : Infrastructure | Tags : Cyber Security, VLAN, Kafka, Cloudera, CDH, Network

A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system…

César BEREZOWSKI

Par César BEREZOWSKI

22 juil. 2017

MiNiFi: Data at Scales & the Values of Starting Small

MiNiFi: Data at Scales & the Values of Starting Small

Catégories : Big Data, DevOps & SRE, Infrastructure | Tags : MiNiFi, NiFi, C++, HDF, Cloudera, HDP, IOT

This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it’s a NiFi minimal agent to deploy on small devices to bring data to a cluster’s NiFi pipeline (ex: IoT…

César BEREZOWSKI

Par César BEREZOWSKI

8 juil. 2017

Advanced multi-tenant Hadoop and Zookeeper protection

Advanced multi-tenant Hadoop and Zookeeper protection

Catégories : Big Data, Infrastructure | Tags : DoS, iptables, Operation, Scalability, Zookeeper, Clustering, Consensus

Zookeeper is a critical component to Hadoop’s high availability operation. The latter protects itself by limiting the number of maximum connections (maxConns = 400). However Zookeeper does not protect…

Pierre SAUVAGE

Par Pierre SAUVAGE

5 juil. 2017

HDP cluster monitoring

HDP cluster monitoring

Catégories : Big Data, DevOps & SRE, Infrastructure | Tags : Alert, Ambari, Metrics, Monitoring, HDP, REST

With the current growth of BigData technologies, more and more companies are building their own clusters in hope to get some value of their data. One main concern while building these infrastructures…

Joris RUMMENS

Par Joris RUMMENS

5 juil. 2017

Hadoop development cluster of virtual machines with static IP using VirtualBox

Hadoop development cluster of virtual machines with static IP using VirtualBox

Catégories : Infrastructure | Tags : Ambari, Hortonworks, Red Hat, VirtualBox, VM, VMware, Cloudera, Network

A few days ago, I explained how to set up a cluster of virtual machine with static IPsand Internet access suitable to host your Hadoop cluster locally for development. At the time I made use of VMWare…

David WORMS

Par David WORMS

14 mars 2013

Virtual machines with static IP for your Hadoop development cluster

Virtual machines with static IP for your Hadoop development cluster

Catégories : Infrastructure | Tags : Ambari, Hortonworks, Red Hat, VirtualBox, VM, VMware, Cloudera, Network

While I am about to install and test Ambari, this article is the occasion to illustrate how I set up my development environment with multiple virtual machines. Ambari, the deployment and monitoring…

David WORMS

Par David WORMS

27 févr. 2013

Canada - Maroc - France

Nous sommes une équipe passionnée par l'Open Source, le Big Data et les technologies associées telles que le Cloud, le Data Engineering, la Data Science le DevOps…

Nous fournissons à nos clients un savoir faire reconnu sur la manière d'utiliser les technologies pour convertir leurs cas d'usage en projets exploités en production, sur la façon de réduire les coûts et d'accélérer les livraisons de nouvelles fonctionnalités.

Si vous appréciez la qualité de nos publications, nous vous invitons à nous contacter en vue de coopérer ensemble.

Support Ukrain