Data Engineering
La donnée est l’énergie qui alimente la transformation digitale. Les développeurs la consomme dans leurs applicatifs. Les Data Analysts la fouille, la requête et la partage. Les Data Scientists alimentent leurs algorithmes avec. Les Data Engineers ont la responsabilité de mettre en place la chaîne de valeur qui inclue la collecte, le nettoyage, l’enrichissement et la mise à disposition des données.
Gérer le passage à l’échelle, garantir la sécurité et l’intégrité des données, être tolérant aux pannes, manipuler des données par lots ou en flux continu, valider les schémas, publier les APIs, sélectionner les formats, modèles et bases de données appropriés à leurs expositions sont autant de prérogatives à la charge du Data Engineer. De son travail découle la confiance et les succès de ceux qui consomme et exploitent la donnée.
Articles associés au Data Engineering
CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP
Catégories : Big Data, Data Engineering, Learning | Tags : NiFi, Business intelligence, Data Engineering, Iceberg, Spark, Big Data, Cloudera, CDP, Data Analytics, Data Lake, Data Warehouse
In this hands-on lab session we demonstrate how to build an end-to-end big data solution with Cloudera Data Platform (CDP) Public Cloud, using the infrastructure we have deployed and configured over…
Par Tobias CHAVARRIA
24 juil. 2023
CDP part 1: introduction to end-to-end data lakehouse architecture with CDP
Catégories : Cloud Computing, Data Engineering, Infrastructure | Tags : Data Engineering, Hortonworks, Iceberg, AWS, Azure, Big Data, Cloud, Cloudera, CDP, Cloudera Manager, Data Warehouse
Cloudera Data Platform (CDP) is a hybrid data platform for big data transformation, machine learning and data analytics. In this series we describe how to build and use an end-to-end big data…
Par Stephan BAUM
8 juin 2023
Keycloak deployment in EC2
Catégories : Cloud Computing, Data Engineering, Infrastructure | Tags : Security, EC2, Authentication, AWS, Docker, Keycloak, SSL/TLS, SSO
Why use Keycloak Keycloak is an open-source identity provider (IdP) using single sign-on (SSO). An IdP is a tool to create, maintain, and manage identity information for principals and to provide…
Par Stephan BAUM
14 mars 2023
Big data infrastructure internship
Catégories : Big Data, Data Engineering, DevOps & SRE, Infrastructure | Tags : Infrastructure, Hadoop, Big Data, Cluster, Internship, Kubernetes, TDP
Job description Big Data and distributed computing are at the core of Adaltas. We accompagny our partners in the deployment, maintenance, and optimization of some of the largest clusters in France…
Par Stephan BAUM
2 déc. 2022
Comparison of database architectures: data warehouse, data lake and data lakehouse
Catégories : Big Data, Data Engineering | Tags : Data Governance, Infrastructure, Iceberg, Parquet, Spark, Data Lake, Data lakehouse, Data Warehouse, File Format
Database architectures have experienced constant innovation, evolving with the appearence of new use cases, technical constraints, and requirements. From the three database structures we are comparing…
Par Gonzalo ETSE
17 mai 2022
Databricks logs collection with Azure Monitor at a Workspace Scale
Catégories : Cloud Computing, Data Engineering, Adaltas Summit 2021 | Tags : Metrics, Monitoring, Spark, Azure, Databricks, Log4j
Databricks is an optimized data analytics platform based on Apache Spark. Monitoring Databricks plateform is crucial to ensure data quality, job performance, and security issues by limiting access to…
Par Claire PLAYE
10 mai 2022
An overview of Cloudera Data Platform (CDP)
Catégories : Big Data, Cloud Computing, Data Engineering | Tags : SDX, Big Data, Cloud, Cloudera, CDP, CDH, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Warehouse
Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security and…
19 juil. 2021
Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI
Catégories : Data Engineering, Learning | Tags : Cloud, Data Lake, Databricks, Delta Lake, MLflow
Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…
Par Anna KNYAZEVA
26 mai 2021
Find your way into data related Microsoft Azure certifications
Catégories : Cloud Computing, Data Engineering | Tags : Data Governance, Azure, Data Science
Microsoft Azure has certification paths for many technical job roles such as developer, Data Engineer, Data Scientist and solution architect among others. Each of these certifications consists of…
Par Barthelemy NGOM
14 avr. 2021