Streaming

Related articles

Spark Streaming part 4: clustering with Spark MLlib

Spark Streaming part 4: clustering with Spark MLlib

Categories: Data Engineering, Data Science, Learning | Tags: Spark, Apache Spark Streaming, Big Data, Clustering, Machine Learning, Scala, Streaming

Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…

By Oskar RYNKIEWICZ

Jul 11, 2019

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop

Categories: Data Engineering, Learning | Tags: Spark, Apache Spark Streaming, Big Data, File Format, Data Governance, Python, Streaming, Hadoop

Spark can process streaming data on a multi-node Hadoop cluster relying on HDFS for the storage and YARN for the scheduling of jobs. Thus, Spark Structured Streaming integrates well with Big Data…

By Oskar RYNKIEWICZ

May 28, 2019

Spark Streaming part 1: build data pipelines with Spark Structured Streaming

Spark Streaming part 1: build data pipelines with Spark Structured Streaming

Categories: Data Engineering, Learning | Tags: Kafka, Spark, Apache Spark Streaming, Big Data, Streaming

Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. The…

By Oskar RYNKIEWICZ

Apr 18, 2019

Apache Flink: past, present and future

Apache Flink: past, present and future

Categories: Data Engineering | Tags: Flink, Batch processing, Consistency, Kubernetes, Ledger, Machine Learning, Micro Services, Pipeline, Streaming, SQL

Apache Flink is a little gem which deserves a lot more attention. Let’s dive into Flink’s past, its current state and the future it is heading to by following the keynotes and presentations at Flink…

By César BEREZOWSKI

Nov 5, 2018

Apache Beam: a unified programming model for data processing pipelines

Apache Beam: a unified programming model for data processing pipelines

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Apex, Beam, Flink, Spark, Batch processing, Java, Pipeline, Python, Streaming

In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. At Dataworks Summit 2018 in…

By Gauthier LEONARD

May 24, 2018

What's new in Apache Spark 2.3?

What's new in Apache Spark 2.3?

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, ORC, Spark, Spark MLlib, PySpark, Docker, Kubernetes, Streaming, Tuning, pandas

Let’s dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apache…

By César BEREZOWSKI

May 23, 2018

Node CSV version 0.2.1

Node CSV version 0.2.1

Categories: Node.js | Tags: CoffeeScript, CSV, Release and features, Streaming

After the announcement of the version 0.2.0 of the Node.js CSV parser at the begining of october, we are releasing today a new version 0.2.1. This is mostly a bug fix release with enhanced…

By David WORMS

Jul 24, 2012

Node CSV version 0.1 and future developments

Node CSV version 0.1 and future developments

Categories: Node.js | Tags: CoffeeScript, CSV, Markdown, Release and features, Streaming

The Node CSV parser has just reach version 0.1 which close the 0.0.x releases. Started almost 2 years ago, the project has received a tremendous amount of participation in the form of bug reports…

By David WORMS

Jul 21, 2012

Node CSV version 0.2 with streaming API

Node CSV version 0.2 with streaming API

Categories: Node.js | Tags: CSV, Data Engineering, Markdown, Node.js, Streaming

The Node CSV parser in its version 0.2 has just been released. This version is a major enhancement as it aligned the parser with the best Node.js practice in respect of streams. The CSV parser behave…

By David WORMS

Jul 2, 2012

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.