Published articles
Insert rows in BigQuery tables with complex columns
Categories: Cloud Computing, Data Engineering | Tags: GCP, BigQuery, Schema, SQL
Google’s BigQuery is a cloud data warehousing system designed to process enormous volumes of data with several features available. Out of all those features, let’s talk about the support of Struct…
Nov 22, 2019
Mount Aladdin eToken in Firefox on Archlinux
Categories: Hack | Tags: Arch Linux, Cyber Security, Firefox, Security, Smart card, 2FA
Given you’re on Archlinux and have an Aladdin eToken, let’s see how we can mount it in Firefox for web authentication. An Aladdin eToken is a cryptographic device (token, smart card) that stores…
Jul 12, 2019
Apache Flink: past, present and future
Categories: Data Engineering | Tags: Pipeline, Flink, Kubernetes, Machine Learning, SQL, Streaming
Apache Flink is a little gem which deserves a lot more attention. Let’s dive into Flink’s past, its current state and the future it is heading to by following the keynotes and presentations at Flink…
Nov 5, 2018
What's new in Apache Spark 2.3?
Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, PySpark, Tuning, ORC, Spark, Spark MLlib, Data Science, Docker, Kubernetes, pandas, Streaming
Let’s dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apache…
May 23, 2018
Execute Python in an Oozie workflow
Categories: Data Engineering | Tags: Oozie, Elasticsearch, Python, REST
Oozie workflows allow you to use multiple actions to execute code, however doing so with Python can be a bit tricky, let’s see how to do that. I’ve recently designed a workflow that would interact…
Mar 6, 2018
From Dockerfile to Ansible Containers
Categories: Containers Orchestration, DevOps & SRE, Open Source Summit Europe 2017 | Tags: pip, Shell, Ansible, Docker, Docker Compose, YAML
This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. It was hold by Tomas Tomecek from Red Hat’s containerization team. The Dockerfile…
Oct 25, 2017
Cloudera Sessions Paris 2017
Categories: Big Data, Events | Tags: Altus, CDSW, SDX, EC2, Azure, Cloudera, CDH, Data Science, PaaS
Adaltas was at the Cloudera Sessions on October 5, where Cloudera showcased their new products and offerings. Below you’ll find a summary of what we witnessed. Note: the information were aggregated in…
Oct 16, 2017
Exposing Kafka on two different networks
Categories: Infrastructure | Tags: Cyber Security, VLAN, Kafka, Cloudera, CDH, Network
A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system…
Jul 22, 2017
Change Ambari's topbar color
Categories: Big Data, Hack | Tags: Ambari, Front-end
We recently had a client that has multiple environments (Production, Integration, Testing, …) running on HDP and managed using one Ambari instance per cluster. One of the questions that came up was…
Jul 9, 2017
MiNiFi: Data at Scales & the Values of Starting Small
Categories: Big Data, DevOps & SRE, Infrastructure | Tags: MiNiFi, NiFi, C++, HDF, Cloudera, HDP, IOT
This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it’s a NiFi minimal agent to deploy on small devices to bring data to a cluster’s NiFi pipeline (ex: IoT…
Jul 8, 2017
Apache Apex: next gen Big Data analytics
Categories: Data Science, Events, Tech Radar | Tags: Apex, Storm, Tools, Flink, Hadoop, Kafka, Data Science, Machine Learning
Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallel…
Jul 17, 2016
Get in control of your workflows with Apache Airflow
Categories: Big Data, Tech Radar | Tags: DevOps, Airflow, Cloud, Python
Below is a compilation of my notes taken during the presentation of Apache Airflow by Christian Trebing from BlueYonder. Introduction Use case: how to handle data coming in regularly from customers…
Jul 17, 2016