Formation

Le partage des connaissances chez Adaltas se concrétise par le transfert de compétences chez nos clients, la mise en place de formations sur-mesures, nos fréquentes publications d'articles, nos contributions Open Source ainsi que l'enseignement dans plusieurs universités et organismes de formations.

Articles associés

CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP

Catégories : Big Data, Data Engineering, Learning | Tags : NiFi, Business intelligence, Data Engineering, Iceberg, Spark, Big Data, Cloudera, CDP, Data Analytics, Data Lake, Data Warehouse

In this hands-on lab session we demonstrate how to build an end-to-end big data solution with Cloudera Data Platform (CDP) Public Cloud, using the infrastructure we have deployed and configured over…

Par Tobias CHAVARRIA

24 juil. 2023

Framework laptop with NixOS, a user feedback

Catégories : Learning, Tech Radar | Tags : CLI, DevOps, Learning and tutorial, Linux, Packaging, NixOS, Open source

A new job comes with a new laptop. As such, I was given a Framework Laptop DIY Edition with the objective to install and configure it entirely with NixOS. I will share my first impressions after…

Par Carlos JESUS CARO

22 août 2022

Ceph object storage within a Kubernetes cluster with Rook

Catégories : Big Data, Data Governance, Learning | Tags : Amazon S3, Big Data, Ceph, Cluster, Data Lake, Kubernetes, Storage

Ceph is a distributed all-in-one storage system. Reliable and mature, its first stable version was released in 2012 and has since then been the reference for open source storage. Ceph’s main perk is…

Par Luka BIGOT

4 août 2022

MinIO object storage within a Kubernetes cluster

Catégories : Big Data, Data Governance, Learning | Tags : Amazon S3, Big Data, Cluster, Data Lake, Kubernetes, Storage

MinIO is a popular object storage solution. Often recommended for its simple setup and ease of use, it is not only a great way to get started with object storage: it also provides excellent…

Par Luka BIGOT

9 juil. 2022

TDP workshop: Become a TDP power user from your terminal

Catégories : Events, Learning | Tags : DevOps, Ansible, Hadoop, Open source, TDP

The TDP CLI is used to deploy and operate your TDP services. It relies on tdp-lib to provide control and flexibility at your fingertips. Some time ago, we announced the public release of TDP - Trunk…

Par Paul FARAULT

17 juin 2022

NixOS: Enabling LXD virtual machines using Flakes

Catégories : Hack, Learning | Tags : Learning and tutorial, Linux, LXD, Packaging, VM, GitHub, NixOS, Open source

Nixpkgs is an ever-increasing collection of software packages for Nix and NixOS. Even with more than 80,000 packages, you easily run in a situation where there is a functionality that is not yet…

Par Kellian COTTART

13 mai 2022

Reliable and reproducible Linux installation with NixOS

Catégories : Infrastructure, Learning | Tags : Linux, Packaging, VM, NixOS, TDP

When using an operating system, upgrading packages or installing new ones are common tasks that introduce the risk of affecting the stability of the system. NixOS is a Linux distribution that ensures…

Par Florent MOUAFFO

8 févr. 2022

Nix introduction, main concepts and commands

Catégories : Infrastructure, Learning | Tags : Arch Linux, CentOS, Linux, OS X, Packaging, Ubuntu, NixOS, TDP

Nix is a functional package manager for Linux and other Unix systems, making the management of packages more reliable and easy to reproduce. With a traditional package manager, when updating a package…

Par Florent MOUAFFO

1 févr. 2022

Blockchain 101: Blockchains and Consensus Mechanisms

Catégories : Adaltas Summit 2021, Infrastructure, Learning | Tags : Cryptography, Infrastructure, Blockchain, Consensus

Cryptocurrencies are booming in 2021, with a market cap moving from 750 to more than 3,000 billion dollars. Let’s face it, this is mainly due to speculation. A lot of people involved do not have a…

Par Gauthier LEONARD

18 janv. 2022

Spring 2022 internship - building a Data Lab

Catégories : Data Science, Learning | Tags : MongoDB, Spark, Argo CD, Elasticsearch, Internship, Keycloak, Kubernetes, OpenID Connect, PostgreSQL

Job Description Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation…

Par David WORMS

24 nov. 2021

H2O in practice: a protocol combining AutoML with traditional modeling approaches

Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python, XGBoost

H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…

Par Petra KAFERLE DEVISSCHERE

12 nov. 2021

Internship in Big Data infrastructure with TDP

Catégories : Infrastructure, Learning | Tags : Cyber Security, DevOps, Java, Hadoop, IaC, Internship, TDP

Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…

Par Daniel HARTY

25 oct. 2021

Internship in Data Engineering

Catégories : Front End, Learning | Tags : Metrics, Monitoring, Hive, Kafka, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, Streaming

Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine raw data into information that can be used by business analysts and data…

Par David WORMS

25 oct. 2021

Internship in Web Technologies

Catégories : Front End, Learning | Tags : DevOps, LDAP, React.js, CI/CD, Docker, GraphQL, IaC, Internship, Kubernetes, Node.js, OAuth2

Job Description As part of its Big Data activities, Adaltas Academy is an information-sharing platform bringing together articles, training content, and a knowledge base. The users of the platform are…

Par David WORMS

14 oct. 2021

H2O in practice: a Data Scientist feedback

Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python

Automated machine learning (AutoML) platforms are gaining popularity and becoming a new important tool in the data scientists’ toolbox. A few months ago, I introduced H2O, an open-source platform for…

Par Petra KAFERLE DEVISSCHERE

29 sept. 2021

Adaltas Summit 2021, 2nd edition in corsica

Catégories : Adaltas Summit 2021, Learning | Tags : Ansible, Hadoop, Spark, Azure, Blockchain, Deep Learning, Docker, Terraform, Kubernetes, Node.js

For its second edition, the whole Adaltas crew is gathering in Corsica for a whole week with 2 days dedicated to technology the 23rd and the 24th of september 2021. After a year and a half of sanitary…

Par David WORMS

21 sept. 2021

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Catégories : Data Engineering, Learning | Tags : Cloud, Data Lake, Databricks, Delta Lake, MLflow

Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…

Par Anna KNYAZEVA

26 mai 2021

TensorFlow Extended (TFX): the components and their functionalities

Catégories : Big Data, Data Engineering, Data Science, Learning | Tags : Beam, Data Engineering, Pipeline, CI/CD, Data Science, Deep Learning, Deployment, Machine Learning, MLOps, Open source, Python, TensorFlow

Putting Machine Learning (ML) and Deep Learning (DL) models in production certainly is a difficult task. It has been recognized as more failure-prone and time consuming than the modeling itself, yet…

Par Petra KAFERLE DEVISSCHERE

5 mars 2021

Faster model development with H2O AutoML and Flow

Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python

Building Machine Learning (ML) models is a time-consuming process. It requires expertise in statistics, ML algorithms, and programming. On top of that, it also requires the ability to translate a…

Par Petra KAFERLE DEVISSCHERE

10 déc. 2020

Experiment tracking with MLflow on Databricks Community Edition

Catégories : Data Engineering, Data Science, Learning | Tags : Spark, Databricks, Deep Learning, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn

Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and the…

Par Petra KAFERLE DEVISSCHERE

10 sept. 2020

Importing data to Databricks: external tables and Delta Lake

Catégories : Data Engineering, Data Science, Learning | Tags : Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python

During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…

Par Petra KAFERLE DEVISSCHERE

21 mai 2020

Optimization of Spark applications in Hadoop YARN

Catégories : Data Engineering, Learning | Tags : Tuning, Hadoop, Spark, Python

Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This article…

Par Ferdinand DE BAECQUE

30 mars 2020

MLflow tutorial: an open source Machine Learning (ML) platform

Catégories : Data Engineering, Data Science, Learning | Tags : AWS, Azure, Databricks, Deep Learning, Deployment, Machine Learning, MLflow, MLOps, Python, Scikit-learn

Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Science…

Par Petra KAFERLE DEVISSCHERE

23 mars 2020

TensorFlow installation on Docker

Catégories : Containers Orchestration, Data Science, Learning | Tags : CPU, Jupyter, Linux, AI, Deep Learning, Docker, TensorFlow

TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…

Par Pierre SAUVAGE

5 août 2019

Spark Streaming part 4: clustering with Spark MLlib

Catégories : Data Engineering, Data Science, Learning | Tags : Spark, Apache Spark Streaming, Big Data, Clustering, Machine Learning, Scala, Streaming

Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…

Par Oskar RYNKIEWICZ

27 juin 2019

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop

Catégories : Data Engineering, Learning | Tags : Spark, Apache Spark Streaming, Python, Streaming

Spark can process streaming data on a multi-node Hadoop cluster relying on HDFS for the storage and YARN for the scheduling of jobs. Thus, Spark Structured Streaming integrates well with Big Data…

Par Oskar RYNKIEWICZ

28 mai 2019

Spark Streaming part 1: build data pipelines with Spark Structured Streaming

Catégories : Data Engineering, Learning | Tags : Kafka, Spark, Apache Spark Streaming, Big Data, Streaming

Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. The…

Par Oskar RYNKIEWICZ

18 avr. 2019

First Class Functions in Python

Catégories : Hack, Learning | Tags : Programming, Python

I recently watched a talk by Dave Cheney about first class functions in Go. Python supports first class functions too, so can we use them in the same ways? Absolutely. I have been using Python for a…

Par Arthur BUSSER

15 avr. 2019

CodaLab – Data Science competitions

Catégories : Data Science, Adaltas Summit 2018, Learning | Tags : Database, Infrastructure, Machine Learning, MySQL, Node.js, Python

CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it…

Par Robert Walid SOARES

17 déc. 2018

One week to discuss technology in a Moroccan riad

Catégories : Adaltas Summit 2018, Learning | Tags : Flink, CDSW, Gatsby, React.js, Hadoop, Knox, Data Science, Deep Learning, Kubernetes, Node.js

Adaltas organise the year its first conference between the 22 and 26 of October. On the agenda of these 5 days of conference: discuss technology in one of the most beautiful riad of Marrakech. Mix the…

Par David WORMS

11 oct. 2018

Lando: Deep Learning used to summarize conversations

Catégories : Data Science, Learning | Tags : Micro Services, Open API, Deep Learning, Internship, Kubernetes, Neural Network, Node.js

Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to…

Par Yliess HATI

18 sept. 2018

Notes after Katacoda Training on Kubernetes Container Orchestration

Catégories : Containers Orchestration, Learning | Tags : Helm, Ingress, Kubeadm, CNI, Micro Services, Minikube, Kubernetes

A few weeks ago, I dedicated two days to follow the turorials available on Katacoda, the interactive learning platform for Kubernetes or any other container orchestration platform. I’m sharing my…

Par David WORMS

14 déc. 2017

Scaling massive, real-time data pipelines with Go

Catégories : Open Source Summit Europe 2017, Learning | Tags : Algorithm, Data structures, Go Lang, Pipeline, Protocols, Network

Last week at the Open Source Summit in Prague, Jean de Klerk held a talk called Scaling massive, real-time data pipelines with Go. This article goes over the main points of the talk, detailing the…

Par Arthur BUSSER

21 nov. 2017

Apache Hive Essentials How-to by Darren Lee

Catégories : Business Intelligence, Learning | Tags : UDF, Hadoop, Hive, File Format, SQL

Recently, I’ve been ask to review a new book on Apache Hive called “Apache Hive Essentials How-to” (edit: the second edition is now available) written by Darren Lee and published by Packt Publishing…

Par David WORMS

23 avr. 2013

Hadoop and HBase installation on OSX in pseudo-distributed mode

Catégories : Big Data, Learning | Tags : Hue, Infrastructure, Hadoop, HBase, Big Data, Deployment

The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a…

Par David WORMS

1 déc. 2010