Articles published in 2016

Hive Metastore HA with DBTokenStore: Failed to initialize master key

Hive Metastore HA with DBTokenStore: Failed to initialize master key

Categories: Big Data, DevOps & SRE | Tags: Infrastructure, Hive, Bug

This article describes my little adventure around a startup error with the Hive Metastore. It shall be reproducable with any secure installation, meaning with Kerberos, with high availability enabled…

David WORMS

By David WORMS

Jul 21, 2016

EclairJS - Putting a Spark in Web Apps

EclairJS - Putting a Spark in Web Apps

Categories: Data Engineering, Front End | Tags: Jupyter, Spark, JavaScript

Presentation by David Fallside from IBM, images extracted from the presentation. Introduction Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich…

David WORMS

By David WORMS

Jul 17, 2016

Apache Apex with Apache SAMOA

Apache Apex with Apache SAMOA

Categories: Data Science, Events, Tech Radar | Tags: Apex, Samoa, Storm, Tools, Flink, Hadoop, Machine Learning

Traditional Machine Learning Batch Oriented Supervised - most common Training and Scoring One time model building Data set Training: Model building Holdout: Paremeter tuning Test: Accuracy Online…

Pierre SAUVAGE

By Pierre SAUVAGE

Jul 17, 2016

Apache Apex: next gen Big Data analytics

Apache Apex: next gen Big Data analytics

Categories: Data Science, Events, Tech Radar | Tags: Apex, Storm, MongoDB, Tools, Flink, Hadoop, Kafka, Data Science, Machine Learning, Redis

Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallel…

César BEREZOWSKI

By César BEREZOWSKI

Jul 17, 2016

Get in control of your workflows with Apache Airflow

Get in control of your workflows with Apache Airflow

Categories: Big Data, Tech Radar | Tags: DevOps, Airflow, Cloud, PostgreSQL, Python

Below is a compilation of my notes taken during the presentation of Apache Airflow by Christian Trebing from BlueYonder. Introduction Use case: how to handle data coming in regularly from customers…

César BEREZOWSKI

By César BEREZOWSKI

Jul 17, 2016

Hive, Calcite and Druid

Hive, Calcite and Druid

Categories: Big Data | Tags: Business intelligence, Database, Druid, Hadoop, Hive, Storage

BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal…

David WORMS

By David WORMS

Jul 14, 2016

Network Namespace without Docker

Network Namespace without Docker

Categories: Hack | Tags: DNS, Linux, Namespaces, VLAN, Docker, Network

Let’s imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I’m gonna use when I launch apps. My app doesn’t allow me to choose a…

Pierre SAUVAGE

By Pierre SAUVAGE

Jul 6, 2016

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain