Formation
Le partage des connaissances chez Adaltas se concrétise par le transfert de compétences chez nos clients, la mise en place de formations sur-mesures, nos fréquentes publications d'articles, nos contributions Open Source ainsi que l'enseignement dans plusieurs universités et organismes de formations.
Articles associés
CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP
Catégories : Big Data, Data Engineering, Learning | Tags : NiFi, Business intelligence, Data Engineering, Iceberg, Spark, Big Data, Cloudera, CDP, Data Analytics, Data Lake, Data Warehouse
In this hands-on lab session we demonstrate how to build an end-to-end big data solution with Cloudera Data Platform (CDP) Public Cloud, using the infrastructure we have deployed and configured over…
Par Tobias CHAVARRIA
24 juil. 2023
Framework laptop with NixOS, a user feedback
Catégories : Learning, Tech Radar | Tags : CLI, DevOps, Learning and tutorial, Linux, Packaging, NixOS, Open source
A new job comes with a new laptop. As such, I was given a Framework Laptop DIY Edition with the objective to install and configure it entirely with NixOS. I will share my first impressions after…
22 août 2022
Ceph object storage within a Kubernetes cluster with Rook
Catégories : Big Data, Data Governance, Learning | Tags : Amazon S3, Big Data, Ceph, Cluster, Data Lake, Kubernetes, Storage
Ceph is a distributed all-in-one storage system. Reliable and mature, its first stable version was released in 2012 and has since then been the reference for open source storage. Ceph’s main perk is…
Par Luka BIGOT
4 août 2022
MinIO object storage within a Kubernetes cluster
Catégories : Big Data, Data Governance, Learning | Tags : Amazon S3, Big Data, Cluster, Data Lake, Kubernetes, Storage
MinIO is a popular object storage solution. Often recommended for its simple setup and ease of use, it is not only a great way to get started with object storage: it also provides excellent…
Par Luka BIGOT
9 juil. 2022
TDP workshop: Become a TDP power user from your terminal
Catégories : Events, Learning | Tags : DevOps, Ansible, Hadoop, Open source, TDP
The TDP CLI is used to deploy and operate your TDP services. It relies on tdp-lib to provide control and flexibility at your fingertips. Some time ago, we announced the public release of TDP - Trunk…
Par Paul FARAULT
17 juin 2022
NixOS: Enabling LXD virtual machines using Flakes
Catégories : Hack, Learning | Tags : Learning and tutorial, Linux, LXD, Packaging, VM, GitHub, NixOS, Open source
Nixpkgs is an ever-increasing collection of software packages for Nix and NixOS. Even with more than 80,000 packages, you easily run in a situation where there is a functionality that is not yet…
Par Kellian COTTART
13 mai 2022
Reliable and reproducible Linux installation with NixOS
Catégories : Infrastructure, Learning | Tags : Linux, Packaging, VM, NixOS, TDP
When using an operating system, upgrading packages or installing new ones are common tasks that introduce the risk of affecting the stability of the system. NixOS is a Linux distribution that ensures…
Par Florent MOUAFFO
8 févr. 2022
Nix introduction, main concepts and commands
Catégories : Infrastructure, Learning | Tags : Arch Linux, CentOS, Linux, OS X, Packaging, Ubuntu, NixOS, TDP
Nix is a functional package manager for Linux and other Unix systems, making the management of packages more reliable and easy to reproduce. With a traditional package manager, when updating a package…
Par Florent MOUAFFO
1 févr. 2022
Blockchain 101: Blockchains and Consensus Mechanisms
Catégories : Adaltas Summit 2021, Infrastructure, Learning | Tags : Cryptography, Infrastructure, Blockchain, Consensus
Cryptocurrencies are booming in 2021, with a market cap moving from 750 to more than 3,000 billion dollars. Let’s face it, this is mainly due to speculation. A lot of people involved do not have a…
Par Gauthier LEONARD
18 janv. 2022
Spring 2022 internship - building a Data Lab
Catégories : Data Science, Learning | Tags : MongoDB, Spark, Argo CD, Elasticsearch, Internship, Keycloak, Kubernetes, OpenID Connect, PostgreSQL
Job Description Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation…
Par David WORMS
24 nov. 2021
H2O in practice: a protocol combining AutoML with traditional modeling approaches
Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python, XGBoost
H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…
12 nov. 2021
Internship in Big Data infrastructure with TDP
Catégories : Infrastructure, Learning | Tags : Cyber Security, DevOps, Java, Hadoop, IaC, Internship, TDP
Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…
Par Daniel HARTY
25 oct. 2021
Internship in Data Engineering
Catégories : Front End, Learning | Tags : Metrics, Monitoring, Hive, Kafka, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, Streaming
Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine raw data into information that can be used by business analysts and data…
Par David WORMS
25 oct. 2021
Internship in Web Technologies
Catégories : Front End, Learning | Tags : DevOps, LDAP, React.js, CI/CD, Docker, GraphQL, IaC, Internship, Kubernetes, Node.js, OAuth2
Job Description As part of its Big Data activities, Adaltas Academy is an information-sharing platform bringing together articles, training content, and a knowledge base. The users of the platform are…
Par David WORMS
14 oct. 2021
H2O in practice: a Data Scientist feedback
Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python
Automated machine learning (AutoML) platforms are gaining popularity and becoming a new important tool in the data scientists’ toolbox. A few months ago, I introduced H2O, an open-source platform for…
29 sept. 2021
Adaltas Summit 2021, 2nd edition in corsica
Catégories : Adaltas Summit 2021, Learning | Tags : Ansible, Hadoop, Spark, Azure, Blockchain, Deep Learning, Docker, Terraform, Kubernetes, Node.js
For its second edition, the whole Adaltas crew is gathering in Corsica for a whole week with 2 days dedicated to technology the 23rd and the 24th of september 2021. After a year and a half of sanitary…
Par David WORMS
21 sept. 2021
Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI
Catégories : Data Engineering, Learning | Tags : Cloud, Data Lake, Databricks, Delta Lake, MLflow
Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…
Par Anna KNYAZEVA
26 mai 2021
TensorFlow Extended (TFX): the components and their functionalities
Catégories : Big Data, Data Engineering, Data Science, Learning | Tags : Beam, Data Engineering, Pipeline, CI/CD, Data Science, Deep Learning, Deployment, Machine Learning, MLOps, Open source, Python, TensorFlow
Putting Machine Learning (ML) and Deep Learning (DL) models in production certainly is a difficult task. It has been recognized as more failure-prone and time consuming than the modeling itself, yet…
5 mars 2021
Faster model development with H2O AutoML and Flow
Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python
Building Machine Learning (ML) models is a time-consuming process. It requires expertise in statistics, ML algorithms, and programming. On top of that, it also requires the ability to translate a…
10 déc. 2020
Experiment tracking with MLflow on Databricks Community Edition
Catégories : Data Engineering, Data Science, Learning | Tags : Spark, Databricks, Deep Learning, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn
Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and the…
10 sept. 2020
Importing data to Databricks: external tables and Delta Lake
Catégories : Data Engineering, Data Science, Learning | Tags : Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python
During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…
21 mai 2020
Optimization of Spark applications in Hadoop YARN
Catégories : Data Engineering, Learning | Tags : Tuning, Hadoop, Spark, Python
Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This article…
30 mars 2020
MLflow tutorial: an open source Machine Learning (ML) platform
Catégories : Data Engineering, Data Science, Learning | Tags : AWS, Azure, Databricks, Deep Learning, Deployment, Machine Learning, MLflow, MLOps, Python, Scikit-learn
Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Science…
23 mars 2020
TensorFlow installation on Docker
Catégories : Containers Orchestration, Data Science, Learning | Tags : CPU, Jupyter, Linux, AI, Deep Learning, Docker, TensorFlow
TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…
Par Pierre SAUVAGE
5 août 2019
Spark Streaming part 4: clustering with Spark MLlib
Catégories : Data Engineering, Data Science, Learning | Tags : Spark, Apache Spark Streaming, Big Data, Clustering, Machine Learning, Scala, Streaming
Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…
Par Oskar RYNKIEWICZ
27 juin 2019
Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop
Catégories : Data Engineering, Learning | Tags : Spark, Apache Spark Streaming, Python, Streaming
Spark can process streaming data on a multi-node Hadoop cluster relying on HDFS for the storage and YARN for the scheduling of jobs. Thus, Spark Structured Streaming integrates well with Big Data…
Par Oskar RYNKIEWICZ
28 mai 2019
Spark Streaming part 1: build data pipelines with Spark Structured Streaming
Catégories : Data Engineering, Learning | Tags : Kafka, Spark, Apache Spark Streaming, Big Data, Streaming
Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. The…
Par Oskar RYNKIEWICZ
18 avr. 2019
First Class Functions in Python
Catégories : Hack, Learning | Tags : Programming, Python
I recently watched a talk by Dave Cheney about first class functions in Go. Python supports first class functions too, so can we use them in the same ways? Absolutely. I have been using Python for a…
Par Arthur BUSSER
15 avr. 2019
CodaLab – Data Science competitions
Catégories : Data Science, Adaltas Summit 2018, Learning | Tags : Database, Infrastructure, Machine Learning, MySQL, Node.js, Python
CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it…
17 déc. 2018
One week to discuss technology in a Moroccan riad
Catégories : Adaltas Summit 2018, Learning | Tags : CDSW, Gatsby, React.js, Flink, Hadoop, Knox, Data Science, Deep Learning, Kubernetes, Node.js
Adaltas organise the year its first conference between the 22 and 26 of October. On the agenda of these 5 days of conference: discuss technology in one of the most beautiful riad of Marrakech. Mix the…
Par David WORMS
11 oct. 2018
Lando: Deep Learning used to summarize conversations
Catégories : Data Science, Learning | Tags : Micro Services, Open API, Deep Learning, Internship, Kubernetes, Neural Network, Node.js
Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to…
Par Yliess HATI
18 sept. 2018
Notes after Katacoda Training on Kubernetes Container Orchestration
Catégories : Containers Orchestration, Learning | Tags : Helm, Ingress, Kubeadm, CNI, Micro Services, Minikube, Kubernetes
A few weeks ago, I dedicated two days to follow the turorials available on Katacoda, the interactive learning platform for Kubernetes or any other container orchestration platform. I’m sharing my…
Par David WORMS
14 déc. 2017
Scaling massive, real-time data pipelines with Go
Catégories : Open Source Summit Europe 2017, Learning | Tags : Algorithm, Data structures, Go Lang, Pipeline, Protocols, Network
Last week at the Open Source Summit in Prague, Jean de Klerk held a talk called Scaling massive, real-time data pipelines with Go. This article goes over the main points of the talk, detailing the…
Par Arthur BUSSER
21 nov. 2017
Apache Hive Essentials How-to by Darren Lee
Catégories : Business Intelligence, Learning | Tags : UDF, Hadoop, Hive, File Format, SQL
Recently, I’ve been ask to review a new book on Apache Hive called “Apache Hive Essentials How-to” (edit: the second edition is now available) written by Darren Lee and published by Packt Publishing…
Par David WORMS
23 avr. 2013
Hadoop and HBase installation on OSX in pseudo-distributed mode
Catégories : Big Data, Learning | Tags : Hue, Infrastructure, Hadoop, HBase, Big Data, Deployment
The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a…
Par David WORMS
1 déc. 2010