Databricks
Databricks provides an Apache Spark based analytics platform on three of the largest cloud service providers: Microsoft Azure, Amazon AWS and Google GCP.
Founded by Spark developers, Databricks focuses on monetizing the open source Big Data system Apache Spark by providing a unified and simple user experience. It is used for building large Data Lakes, to implement real-time streaming use cases or to replace large ETL processes.
The Databricks platform propose a workspace to write data centric applications in Spark. The Databricks eco-system is enriched with Delta Lake to expose and manage the data stored in the data lake, MLFlow to develop and operate machine learning pipelines and Databricks SQL to build a multi-cloud data warehouse for business analytics.
Adaltas specialized in Big Data and is a Databricks partner. Our Databricks certified consultants provide support and training mainly in France and the Paris area.
- Learn more
- Official website
Related articles
Data platform requirements and expectations
Categories: Big Data, Infrastructure | Tags: Data Engineering, Data Governance, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Science
A big data platform is a complex and sophisticated system that enables organizations to store, process, and analyze large volumes of data from a variety of sources. It is composed of severalā¦
By David WORMS
Mar 23, 2023
Databricks logs collection with Azure Monitor at a Workspace Scale
Categories: Cloud Computing, Data Engineering, Adaltas Summit 2021 | Tags: Metrics, Monitoring, Spark, Azure, Databricks, Log4j
Databricks is an optimized data analytics platform based on Apache Spark. Monitoring Databricks plateform is crucial to ensure data quality, job performance, and security issues by limiting access toā¦
By Claire PLAYE
May 10, 2022
Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI
Categories: Data Engineering, Learning | Tags: Cloud, Data Lake, Databricks, Delta Lake, MLflow
Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customersā¦
May 26, 2021
Data versioning and reproducible ML with DVC and MLflow
Categories: Data Science, DevOps & SRE, Events | Tags: Data Engineering, Databricks, Delta Lake, Git, Machine Learning, MLflow, Storage
Our talk on data versioning and reproducible Machine Learning proposed to the Data + AI Summit (formerly known as Spark+AI) is accepted. The summit will take place online the 17-19th Novemberā¦
Sep 30, 2020
Experiment tracking with MLflow on Databricks Community Edition
Categories: Data Engineering, Data Science, Learning | Tags: Spark, Databricks, Deep Learning, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn
Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and theā¦
Sep 10, 2020
Version your datasets with Data Version Control (DVC) and Git
Categories: Data Science, DevOps & SRE | Tags: DevOps, Infrastructure, Operation, Git, GitOps, SCM
Using a Version Control System such as Git for source code is a good practice and an industry standard. Considering that projects focus more and more on data, shouldnāt we have a similar approach suchā¦
Sep 3, 2020
Importing data to Databricks: external tables and Delta Lake
Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python
During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a laterā¦
May 21, 2020
MLflow tutorial: an open source Machine Learning (ML) platform
Categories: Data Engineering, Data Science, Learning | Tags: AWS, Azure, Databricks, Deep Learning, Deployment, Machine Learning, MLflow, MLOps, Python, Scikit-learn
Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Scienceā¦
Mar 23, 2020
Should you move your Big Data and Data Lake to the Cloud
Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Azure, Cloud, CDP, Databricks, GCP
Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customerās Bigā¦
Dec 9, 2019