Databricks

Databricks provides an Apache Spark based analytics platform on three of the largest cloud service providers: Microsoft Azure, Amazon AWS and Google GCP.

Founded by Spark developers, Databricks focuses on monetizing the open source Big Data system Apache Spark by providing a unified and simple user experience. It is used for building large Data Lakes, to implement real-time streaming use cases or to replace large ETL processes.

The Databricks platform propose a workspace to write data centric applications in Spark. The Databricks eco-system is enriched with Delta Lake to expose and manage the data stored in the data lake, MLFlow to develop and operate machine learning pipelines and Databricks SQL to build a multi-cloud data warehouse for business analytics.

Adaltas specialized in Big Data and is a Databricks partner. Our Databricks certified consultants provide support and training mainly in France and the Paris area.

Learn more: Official website
Related tags: Apache Spark; Big Data; Business intelligence; Delta Lake; MLflow Machine Learning Lifecycle Platform

Should you move your Big Data and Data Lake to the Cloud

Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Azure, Cloud, CDP, Databricks, GCP

Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customer’s Big…

By Joris RUMMENS

Dec 9, 2019

MLflow tutorial: an open source Machine Learning (ML) platform

Categories: Data Engineering, Data Science, Learning | Tags: AWS, Azure, Databricks, Deep Learning, Deployment, Machine Learning, MLflow, MLOps, Python, Scikit-learn

Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Science…

By Petra KAFERLE DEVISSCHERE

Mar 23, 2020

Importing data to Databricks: external tables and Delta Lake

Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python

During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…

By Petra KAFERLE DEVISSCHERE

May 21, 2020

Version your datasets with Data Version Control (DVC) and Git

Categories: Data Science, DevOps & SRE | Tags: DevOps, Infrastructure, Operation, Git, GitOps, SCM

Using a Version Control System such as Git for source code is a good practice and an industry standard. Considering that projects focus more and more on data, shouldn’t we have a similar approach such…

By Grégor JOUET

Sep 3, 2020

Experiment tracking with MLflow on Databricks Community Edition

Categories: Data Engineering, Data Science, Learning | Tags: Spark, Databricks, Deep Learning, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn

Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and the…

By Petra KAFERLE DEVISSCHERE

Sep 10, 2020

Data versioning and reproducible ML with DVC and MLflow

Categories: Data Science, DevOps & SRE, Events | Tags: Data Engineering, Databricks, Delta Lake, Git, Machine Learning, MLflow, Storage

Our talk on data versioning and reproducible Machine Learning proposed to the Data + AI Summit (formerly known as Spark+AI) is accepted. The summit will take place online the 17-19th November…

By Petra KAFERLE DEVISSCHERE

Sep 30, 2020

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Categories: Data Engineering, Learning | Tags: Cloud, Data Lake, Databricks, Delta Lake, MLflow

Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…

By Anna KNYAZEVA

May 26, 2021

Databricks logs collection with Azure Monitor at a Workspace Scale

Categories: Cloud Computing, Data Engineering, Adaltas Summit 2021 | Tags: Metrics, Monitoring, Spark, Azure, Databricks, Log4j

Databricks is an optimized data analytics platform based on Apache Spark. Monitoring Databricks plateform is crucial to ensure data quality, job performance, and security issues by limiting access to…

By Claire PLAYE

May 10, 2022

Data platform requirements and expectations

Categories: Big Data, Infrastructure | Tags: Data Engineering, Data Governance, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Science

A big data platform is a complex and sophisticated system that enables organizations to store, process, and analyze large volumes of data from a variety of sources. It is composed of several…

By David WORMS

Mar 23, 2023

Databricks

Related articles