Operation

Related articles

Data versioning and reproducible ML with DVC and MLflow

Data versioning and reproducible ML with DVC and MLflow

Categories: Data Science, DevOps & SRE, Events | Tags: Data Engineering, Databricks, Delta Lake, Git, Machine Learning, MLflow, Storage

Our talk on data versioning and reproducible Machine Learning proposed to the Data + AI Summit (formerly known as Spark+AI) is accepted. The summit will take place online the 17-19th Novemberā€¦

Version your datasets with Data Version Control (DVC) and Git

Version your datasets with Data Version Control (DVC) and Git

Categories: Data Science, DevOps & SRE | Tags: DevOps, Infrastructure, Operation, Git, GitOps, SCM

Using a Version Control System such as Git for source code is a good practice and an industry standard. Considering that projects focus more and more on data, shouldnā€™t we have a similar approach suchā€¦

GrƩgor JOUET

By GrƩgor JOUET

Sep 3, 2020

Machine Learning model deployment

Machine Learning model deployment

Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: DevOps, Operation, AI, Cloud, Machine Learning, MLOps, On-premises, Schema

ā€œEnterprise Machine Learning requires looking at the big picture [ā€¦] from a data engineering and a data platform perspective,ā€ lectured Justin Norman during the talk on the deployment of Machineā€¦

Oskar RYNKIEWICZ

By Oskar RYNKIEWICZ

Sep 30, 2019

Auto-scaling Druid with Kubernetes

Auto-scaling Druid with Kubernetes

Categories: Big Data, Business Intelligence, Containers Orchestration | Tags: Helm, Metrics, OLAP, Operation, Container Orchestration, EC2, Druid, Cloud, CNCF, Data Analytics, Kubernetes, Prometheus, Python

Apache Druid is an open-source analytics data store which could leverage the auto-scaling abilities of Kubernetes due to its distributed nature and its reliance on memory. I was inspired by the talkā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Jul 16, 2019

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Categories: Big Data, Infrastructure | Tags: Slider, Erasure Coding, Rolling Upgrade, HDFS, Spark, YARN, Docker

Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your currentā€¦

Lucas BAKALIAN

By Lucas BAKALIAN

Jul 25, 2018

Data Lake ingestion best practices

Data Lake ingestion best practices

Categories: Big Data, Data Engineering | Tags: NiFi, Data Governance, HDF, Operation, Avro, Hive, ORC, Spark, Data Lake, File Format, Protocol Buffers, Registry, Schema

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customersā€¦

David WORMS

By David WORMS

Jun 18, 2018

Running Enterprise Workloads in the Cloud with Cloudbreak

Running Enterprise Workloads in the Cloud with Cloudbreak

Categories: Big Data, Cloud Computing, DataWorks Summit 2018 | Tags: Cloudbreak, Operation, Hadoop, AWS, Azure, GCP, HDP, OpenStack

This article is based on Peter Darvasi and Richard Doktoricsā€™ talk Running Enterprise Workloads in the Cloud at the DataWorks Summit 2018 in Berlin. It presents Hortonworksā€™ automated deployment toolā€¦

Joris RUMMENS

By Joris RUMMENS

May 28, 2018

Ambari - How to blueprint

Ambari - How to blueprint

Categories: Big Data, DevOps & SRE | Tags: Ambari, Automation, DevOps, Operation, Ranger, REST

As infrastructure engineers at Adaltas, we deploy Hadoop clusters. A lot of them. Letā€™s see how to automate this process with REST requests. While really handy for deploying one or two clusters, theā€¦

Joris RUMMENS

By Joris RUMMENS

Jan 17, 2018

Advanced multi-tenant Hadoop and Zookeeper protection

Advanced multi-tenant Hadoop and Zookeeper protection

Categories: Big Data, Infrastructure | Tags: DoS, iptables, Operation, Scalability, Zookeeper, Clustering, Consensus

Zookeeper is a critical component to Hadoopā€™s high availability operation. The latter protects itself by limiting the number of maximum connections (maxConns = 400). However Zookeeper does not protectā€¦

Pierre SAUVAGE

By Pierre SAUVAGE

Jul 5, 2017

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain