Adaltas Logo

Adaltas Talented Open Source consultants
collaborating with your teams.

Cloud and Data Lake
  • UI
  • Front-end
  • Data Science
  • Data Engineering
  • Micro Services
  • RDBMS
  • Containers
  • NoSQL
  • Big Data
  • DevOps
  • Cloud
  • On-premise

Adaltas is a team of consultants with a focus on Open Source, Big Data and distributed systems based in France, Canada and Morocco.

  • Architecture, audit and digital transformation
  • Cloud and on-premise operation
  • Complex application and ingestion pipelines
  • Efficient and reliable solutions delivery

Latest articles

Logstash pipelines remote configuration and self-indexing

Logstash pipelines remote configuration and self-indexing

Categories: Data Engineering, Infrastructure | Tags: Docker, Elasticsearch, Kibana, Logstash, Log4j

Logstash is a powerful data collection engine that integrates in the Elastic Stack (Elasticsearch - Logstash - Kibana). The goal of this article is to show you how to deploy a fully managed Logstash…

By Paul-Adrien CORDONNIER

Dec 13, 2019

Should you move your Big Data and Data Lake to the Cloud

Should you move your Big Data and Data Lake to the Cloud

Categories: Big Data, Cloud Computing | Tags: AWS, Cloud, DevOps, GCP, Azure, CDP, Databricks

Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customer’s Big…

By Joris RUMMENS

Dec 9, 2019

Hadoop Ozone part 3: advanced replication strategy with Copyset

Hadoop Ozone part 3: advanced replication strategy with Copyset

Categories: Infrastructure | Tags: HDFS, Kubernetes, Ozone

Hadoop Ozone provide a way of setting a ReplicationType for every write you make on the cluster. Right now is supported HDFS and Ratis but more advanced replication strategies can be achieved. In this…

Hadoop Ozone part 2: tutorial and getting started of its features

Hadoop Ozone part 2: tutorial and getting started of its features

Categories: Infrastructure | Tags: HDFS, CLI, Learning and tutorial, REST, Ozone, Amazon S3

The releases of Hadoop Ozone come with a handy docker-compose file to try out Ozone. The below instructions provide details on how to use it. You can also use the Katacoda training sandbox which…

Hadoop Ozone part 1: an introduction of the new filesystem

Hadoop Ozone part 1: an introduction of the new filesystem

Categories: Infrastructure | Tags: HDFS, Kubernetes, Ozone

Hadoop Ozone is an object store for Hadoop. It is designed to scale to billions of objects of varying sizes. It is currently in development. The roadmap is available on the project wiki. This article…

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Categories: Data Engineering, Data Science | Tags: Flink, Kafka, Spark, DevOps, Kubernetes, Hadoop, HBase, Python

Context The exponential evolution of data has turned the industry upside down by redefining data storage, processing and data ingestion pipelines. Mastering these methods considerably facilitates…

By David WORMS

Nov 26, 2019

InfraOps & DevOps Internship - build a Big Data & Kubernetes PaaS

InfraOps & DevOps Internship - build a Big Data & Kubernetes PaaS

Categories: Big Data, Containers Orchestration | Tags: Kafka, Spark, DevOps, Kubernetes, LXD, NoSQL, Hadoop, Ceph

Context The acquisition of a high-capacity cluster is in line with Adaltas’ desire to build a PAAS-type offering to use and to provide Big Data and container orchestration platforms. The platforms are…

By David WORMS

Nov 26, 2019

Insert rows in BigQuery tables with complex columns

Insert rows in BigQuery tables with complex columns

Categories: Cloud Computing, Data Engineering | Tags: GCP, Schema, BigQuery, SQL

Google’s BigQuery is a cloud data warehousing system designed to process enormous volumes of data with several features available. Out of all those features, let’s talk about the support of Struct…

By César BEREZOWSKI

Nov 22, 2019

Avoid Bottlenecks in distributed Deep Learning pipelines with Horovod

Avoid Bottlenecks in distributed Deep Learning pipelines with Horovod

Categories: Data Science | Tags: Deep Learning, GPU, Keras, TensorFlow, Horovod

The Deep Learning training process can be greatly speed up using a cluster of GPUs. When dealing with huge amounts of data, distributed computing quickly becomes a challenge. A common obstacle which…

By Grégor JOUET

Nov 15, 2019

Kerberos and Spnego authentication on Windows with Firefox

Kerberos and Spnego authentication on Windows with Firefox

Categories: Cyber Security | Tags: Firefox, FreeIPA, HTTP, Kerberos

In Greek mythology, Kerberos, also called Cerberus, guards the gates of the Underworld to prevent the dead from leaving. He is commonly described as a three-headed dog, a serpent’s tail, mane of…

By David WORMS

Nov 4, 2019

Notes on the Cloudera Open Source licensing model

Notes on the Cloudera Open Source licensing model

Categories: Big Data | Tags: CDSW, License, Open source, Cloudera Manager

Following the publication of its Open Source licensing strategy on July 10, 2019 in an article called “our Commitment to Open Source Software”, Cloudera broadcasted a webinar yesterday October 2…

By David WORMS

Oct 25, 2019

Innovation, project vs product culture in Data Science

Innovation, project vs product culture in Data Science

Categories: Data Science, Data Governance | Tags: DevOps, Agile, Scrum

Data Science carries the jobs of tomorrow. It is closely linked to the understanding of the business usecases, the behaviors and the insights that will be extracted from existing data. The stakes are…

By David WORMS

Oct 8, 2019

Machine Learning model deployment

Machine Learning model deployment

Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: AI, Cloud, DevOps, Machine Learning, On-premise, Operation, Schema

“Enterprise Machine Learning requires looking at the big picture … from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…

By Oskar RYNKIEWICZ

Sep 30, 2019

Rook with Ceph doesn't provision my Persistent Volume Claims!

Rook with Ceph doesn't provision my Persistent Volume Claims!

Categories: DevOps & SRE | Tags: Kubernetes, PVC, Linux, Rook, Ubuntu, Ceph

Ceph installation inside Kubernetes can be provisionned using Rook. Currently doing an internship at Adaltas, I was in charge of participating in the setup of a Kubernetes (k8s) cluster. To avoid…

By Eyal CHOJNOWSKI

Sep 9, 2019

Users and RBAC authorizations in Kubernetes

Users and RBAC authorizations in Kubernetes

Categories: Containers Orchestration, Data Governance | Tags: Authentication, Authorization, Cyber Security, Kubernetes, RBAC, SSL/TLS

Having your Kubernetes cluster up and running is just the start of your journey and you now need to operate. To secure its access, user identities must be declared along with authentication and…

By Robert Walid SOARES

Aug 7, 2019

TensorFlow installation on Docker

TensorFlow installation on Docker

Categories: Containers Orchestration, Data Science, Learning | Tags: AI, CPU, Deep Learning, Docker, Jupyter, Linux, TensorFlow

TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…

By Pierre SAUVAGE

Aug 5, 2019

Running Apache Hive 3, new features and tips and tricks

Running Apache Hive 3, new features and tips and tricks

Categories: Big Data, Business Intelligence, DataWorks Summit 2019 | Tags: Druid, Hive, Kafka, JDBC, LLAP, Release and features, Hadoop

Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since…

By Gauthier LEONARD

Jul 25, 2019

Auto-scaling Druid with Kubernetes

Auto-scaling Druid with Kubernetes

Categories: Big Data, Business Intelligence, Containers Orchestration | Tags: EC2, Druid, Cloud, CNCF, Container Orchestration, Data Analytics, Helm, Kubernetes, Metrics, OLAP, Operation, Prometheus, Python

Apache Druid is an open-source analytics data store which could leverage the auto-scaling abilities of Kubernetes due to its distributed nature and its reliance on memory. I was inspired by the talk…

By Leo SCHOUKROUN

Jul 16, 2019

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.