Articles published in 2016
Hive Metastore HA with DBTokenStore: Failed to initialize master key
Categories: Big Data, DevOps & SRE | Tags: Infrastructure, Hive, Bug
This article describes my little adventure around a startup error with the Hive Metastore. It shall be reproducable with any secure installation, meaning with Kerberos, with high availability enabled…
By David WORMS
Jul 21, 2016
EclairJS - Putting a Spark in Web Apps
Categories: Data Engineering, Front End | Tags: Jupyter, Spark, JavaScript
Presentation by David Fallside from IBM, images extracted from the presentation. Introduction Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich…
By David WORMS
Jul 17, 2016
Apache Apex with Apache SAMOA
Categories: Data Science, Events, Tech Radar | Tags: Apex, Samoa, Storm, Tools, Flink, Hadoop, Machine Learning
Traditional Machine Learning Batch Oriented Supervised - most common Training and Scoring One time model building Data set Training: Model building Holdout: Paremeter tuning Test: Accuracy Online…
Jul 17, 2016
Apache Apex: next gen Big Data analytics
Categories: Data Science, Events, Tech Radar | Tags: Apex, Storm, MongoDB, Tools, Flink, Hadoop, Kafka, Data Science, Machine Learning, Redis
Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallel…
Jul 17, 2016
Get in control of your workflows with Apache Airflow
Categories: Big Data, Tech Radar | Tags: DevOps, Airflow, Cloud, PostgreSQL, Python
Below is a compilation of my notes taken during the presentation of Apache Airflow by Christian Trebing from BlueYonder. Introduction Use case: how to handle data coming in regularly from customers…
Jul 17, 2016
Hive, Calcite and Druid
Categories: Big Data | Tags: Business intelligence, Database, Druid, Hadoop, Hive, Storage
BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal…
By David WORMS
Jul 14, 2016
Network Namespace without Docker
Categories: Hack | Tags: DNS, Linux, Namespaces, VLAN, Docker, Network
Let’s imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I’m gonna use when I launch apps. My app doesn’t allow me to choose a…
Jul 6, 2016