Articles published in 2016

Network Namespace without Docker
Categories: Hack | Tags: DNS, Linux, Namespaces, VLAN, Docker, Network
Let’s imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I’m gonna use when I launch apps. My app doesn’t allow me to choose a…
Jul 6, 2016

Get in control of your workflows with Apache Airflow
Categories: Big Data, Tech Radar | Tags: DevOps, Airflow, Cloud, PostgreSQL, Python
Below is a compilation of my notes taken during the presentation of Apache Airflow by Christian Trebing from BlueYonder. Introduction Use case: how to handle data coming in regularly from customers…
Jul 17, 2016

Apache Apex: next gen Big Data analytics
Categories: Data Science, Events, Tech Radar | Tags: Apex, Storm, MongoDB, Tools, Flink, Hadoop, Kafka, Data Science, Machine Learning, Redis
Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallel…
Jul 17, 2016

Apache Apex with Apache SAMOA
Categories: Data Science, Events, Tech Radar | Tags: Apex, Samoa, Storm, Tools, Flink, Hadoop, Machine Learning
Traditional Machine Learning Batch Oriented Supervised - most common Training and Scoring One time model building Data set Training: Model building Holdout: Paremeter tuning Test: Accuracy Online…
Jul 17, 2016

EclairJS - Putting a Spark in Web Apps
Categories: Data Engineering, Front End | Tags: Spark, JavaScript, Jupyter
Presentation by David Fallside from IBM, images extracted from the presentation. Introduction Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich…
By David WORMS
Jul 17, 2016

Hive Metastore HA with DBTokenStore: Failed to initialize master key
Categories: Big Data, DevOps & SRE | Tags: Infrastructure, Hive, Bug
This article describes my little adventure around a startup error with the Hive Metastore. It shall be reproducable with any secure installation, meaning with Kerberos, with high availability enabled…
By David WORMS
Jul 21, 2016

Hive, Calcite and Druid
Categories: Big Data | Tags: Business intelligence, Database, Druid, Hadoop, Hive, Storage
BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal…
By David WORMS
Jul 14, 2016