César is a Big Data & Hadoop Solution Architect and Data Engineer with 4 years of hands-on experience in Hadoop and distributed systems. He has been designing, developing and maintaining data processing workflows and real-time services as well as bringing to clients a consistent vision on data management and workflows across their different data sources and business requirements.

He steps in at all levels of the Data platforms, from planning, design and architecture to clusters deployment, administration, maintenance as well as prototyping and applications development in collaboration with business users, analysts, data scientists, engineering and operational teams. He enjoys discovering stuff and experimenting with new technologies in addition to his day to day work.

He also has a good experience as educator for knowledge transfer and training.

Published articles

Insert rows in BigQuery tables with complex columns

Insert rows in BigQuery tables with complex columns

Categories: Cloud Computing, Data Engineering | Tags: GCP, Schema, BigQuery, SQL

Google’s BigQuery is a cloud data warehousing system designed to process enormous volumes of data with several features available. Out of all those features, let’s talk about the support of Struct…

By César BEREZOWSKI

Nov 22, 2019

Mount Aladdin eToken in Firefox on Archlinux

Mount Aladdin eToken in Firefox on Archlinux

Categories: Hack | Tags: 2FA, Arch Linux, Cyber Security, Firefox, Security, Smart card

Given you’re on Archlinux and have an Aladdin eToken, let’s see how we can mount it in Firefox for web authentication. An Aladdin eToken is a cryptographic device (token, smart card) that stores…

By César BEREZOWSKI

Jul 12, 2019

Apache Flink: past, present and future

Apache Flink: past, present and future

Categories: Data Engineering | Tags: Flink, Kubernetes, Machine Learning, Pipeline, Streaming, SQL

Apache Flink is a little gem which deserves a lot more attention. Let’s dive into Flink’s past, its current state and the future it is heading to by following the keynotes and presentations at Flink…

By César BEREZOWSKI

Nov 5, 2018

What's new in Apache Spark 2.3?

What's new in Apache Spark 2.3?

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, ORC, Spark, Spark MLlib, PySpark, Docker, Kubernetes, Streaming, Tuning, pandas

Let’s dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apache…

By César BEREZOWSKI

May 23, 2018

Execute Python in an Oozie workflow

Execute Python in an Oozie workflow

Categories: Data Engineering | Tags: Oozie, Elasticsearch, REST, Python

Oozie workflows allow you to use multiple actions to execute code, however doing so with Python can be a bit tricky, let’s see how to do that. I’ve recently designed a workflow that would interact…

By César BEREZOWSKI

Mar 6, 2018

From Dockerfile to Ansible Containers

From Dockerfile to Ansible Containers

Categories: Containers Orchestration, DevOps & SRE, Open Source Summit Europe 2017 | Tags: Ansible, Docker, Docker Compose, pip, Shell, YAML

This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. It was hold by Tomas Tomecek from Red Hat’s containerization team. The Dockerfile…

By César BEREZOWSKI

Oct 25, 2017

Cloudera Sessions Paris 2017

Cloudera Sessions Paris 2017

Categories: Big Data, Events | Tags: EC2, Cloudera, Altus, CDSW, SDX, Azure, PaaS, CDH

Adaltas was at the Cloudera Sessions on October 5, where Cloudera showcased their new products and offerings. Below you’ll find a summary of what we witnessed. Note: the information were aggregated in…

By César BEREZOWSKI

Oct 16, 2017

Exposing Kafka on two different networks

Exposing Kafka on two different networks

Categories: Infrastructure | Tags: Kafka, Cloudera, Cyber Security, Network, VLAN, CDH

A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system…

By César BEREZOWSKI

Jul 22, 2017

Change Ambari's topbar color

Change Ambari's topbar color

Categories: Big Data, Hack | Tags: Ambari, Front-end

We recently had a client that has multiple environments (Production, Integration, Testing, …) running on HDP and managed using one Ambari instance per cluster. One of the questions that came up was…

By César BEREZOWSKI

Jul 9, 2017

MiNiFi: Data at Scales & the Values of Starting Small

MiNiFi: Data at Scales & the Values of Starting Small

Categories: Big Data, DevOps & SRE, Infrastructure | Tags: MiNiFi, NiFi, Cloudera, C++, HDP, HDF, IOT

This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it’s a NiFi minimal agent to deploy on small devices to bring data to a cluster’s NiFi pipeline (ex: IoT…

By César BEREZOWSKI

Jul 8, 2017

Apache Apex: next gen Big Data analytics

Apache Apex: next gen Big Data analytics

Categories: Data Science, Events, Tech Radar | Tags: Apex, Flink, Kafka, Storm, Data Science, Machine Learning, Tools, Hadoop

Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallel…

By César BEREZOWSKI

Jul 17, 2016

Get in control of your workflows with Apache Airflow

Get in control of your workflows with Apache Airflow

Categories: Big Data, Tech Radar | Tags: Airflow, Cloud, DevOps, Python

Below is a compilation of my notes taken during the presentation of Apache Airflow by Christian Trebing from BlueYonder. Introduction Use case: how to handle data coming in regularly from customers…

By César BEREZOWSKI

Jul 17, 2016

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.