Streaming

Streaming is the continuous transmission of data over a network. Data streams are a continuous flow of data records, the end of which can usually not be foreseen in advance. The data records are processed continuously as soon as a new data record is received. The amount of data records per unit of time (data rate) can vary and possibly become so large that the limited resources are insufficient for further processing and the recipient has to react accordingly (e.g. discarding data records). In contrast to other data sources, data streams can only be processed continuously record by record - in particular, in contrast to data structures with random access (such as arrays), only sequential access to the individual data records is usually possible.

Data streams are often used for interprocess communication (communication between processes on a computer) and for the transmission of data over networks, especially IoT and for streaming media. They can be used in many ways within the framework of the Pipes and Filters programming paradigm. Pipe is a common functionnality of Unix shells. Examples of data streams are weather data, system metrics, factory devices information, as well as audio and video streams (streaming media).

Related articles

Operating Kafka in Kubernetes with Strimzi

Operating Kafka in Kubernetes with Strimzi

Categories: Big Data, Containers Orchestration, Infrastructure | Tags: Kafka, Big Data, Kubernetes, Open source, Streaming

Kubernetes is not the first platform that comes to mind to run Apache Kafka clusters. Indeed, Kafkaā€™s strong dependency on storage might be a pain point regarding Kubernetesā€™ way of doing things whenā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Mar 7, 2023

Databricks logs collection with Azure Monitor at a Workspace Scale

Databricks logs collection with Azure Monitor at a Workspace Scale

Categories: Cloud Computing, Data Engineering, Adaltas Summit 2021 | Tags: Metrics, Monitoring, Spark, Azure, Databricks, Log4j

Databricks is an optimized data analytics platform based on Apache Spark. Monitoring Databricks plateform is crucial to ensure data quality, job performance, and security issues by limiting access toā€¦

Claire PLAYE

By Claire PLAYE

May 10, 2022

Internship in Data Engineering

Internship in Data Engineering

Categories: Front End, Learning | Tags: Metrics, Monitoring, Hive, Kafka, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, Streaming

Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine ā€‹ā€‹raw data into information that can be used by business analysts and dataā€¦

David WORMS

By David WORMS

Oct 25, 2021

Spark Streaming part 4: clustering with Spark MLlib

Spark Streaming part 4: clustering with Spark MLlib

Categories: Data Engineering, Data Science, Learning | Tags: Spark, Apache Spark Streaming, Big Data, Clustering, Machine Learning, Scala, Streaming

Spark MLlib is an Apacheā€™s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform forā€¦

Oskar RYNKIEWICZ

By Oskar RYNKIEWICZ

Jun 27, 2019

Spark Streaming part 3: DevOps, tools and tests for Spark applications

Spark Streaming part 3: DevOps, tools and tests for Spark applications

Categories: Big Data, Data Engineering, DevOps & SRE | Tags: DevOps, Learning and tutorial, Spark, Apache Spark Streaming

Whenever services are unavailable, businesses experience large financial losses. Spark Streaming applications can break, like any other software application. A streaming application operates on dataā€¦

Oskar RYNKIEWICZ

By Oskar RYNKIEWICZ

May 31, 2019

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop

Categories: Data Engineering, Learning | Tags: Spark, Apache Spark Streaming, Python, Streaming

Spark can process streaming data on a multi-node Hadoop cluster relying on HDFS for the storage and YARN for the scheduling of jobs. Thus, Spark Structured Streaming integrates well with Big Dataā€¦

Oskar RYNKIEWICZ

By Oskar RYNKIEWICZ

May 28, 2019

Spark Streaming part 1: build data pipelines with Spark Structured Streaming

Spark Streaming part 1: build data pipelines with Spark Structured Streaming

Categories: Data Engineering, Learning | Tags: Kafka, Spark, Apache Spark Streaming, Big Data, Streaming

Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. Theā€¦

Oskar RYNKIEWICZ

By Oskar RYNKIEWICZ

Apr 18, 2019

Apache Flink: past, present and future

Apache Flink: past, present and future

Categories: Data Engineering | Tags: Pipeline, Flink, Kubernetes, Machine Learning, SQL, Streaming

Apache Flink is a little gem which deserves a lot more attention. Letā€™s dive into Flinkā€™s past, its current state and the future it is heading to by following the keynotes and presentations at Flinkā€¦

CĆ©sar BEREZOWSKI

By CĆ©sar BEREZOWSKI

Nov 5, 2018

Apache Beam: a unified programming model for data processing pipelines

Apache Beam: a unified programming model for data processing pipelines

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Apex, Beam, Pipeline, Flink, Spark

In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. At Dataworks Summit 2018 inā€¦

Gauthier LEONARD

By Gauthier LEONARD

May 24, 2018

What's new in Apache Spark 2.3?

What's new in Apache Spark 2.3?

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, PySpark, Tuning, ORC, Spark, Spark MLlib, Data Science, Docker, Kubernetes, pandas, Streaming

Letā€™s dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apacheā€¦

CĆ©sar BEREZOWSKI

By CĆ©sar BEREZOWSKI

May 23, 2018

Node CSV version 0.2.1

Node CSV version 0.2.1

Categories: Node.js | Tags: CoffeeScript, CSV, Release and features, Streaming

After the announcement of the version 0.2.0 of the Node.js CSV parser at the beginning of october, we are releasing today a new version 0.2.1. This is mostly a bug fix release with enhancedā€¦

David WORMS

By David WORMS

Jul 24, 2012

Node CSV version 0.1 and future developments

Node CSV version 0.1 and future developments

Categories: Node.js | Tags: CoffeeScript, CSV, Markdown, Release and features, Streaming

The Node CSV parser has just reach version 0.1 which close the 0.0.x releases. Started almost 2 years ago, the project has received a tremendous amount of participation in the form of bug reportsā€¦

David WORMS

By David WORMS

Jul 21, 2012

Node CSV version 0.2 with streaming API

Node CSV version 0.2 with streaming API

Categories: Node.js | Tags: Data Engineering, CSV, Markdown, Node.js, Streaming

The Node CSV parser in its version 0.2 has just been released. This version is a major enhancement as it aligned the parser with the best Node.js practice in respect of streams. The CSV parser behaveā€¦

David WORMS

By David WORMS

Jul 2, 2012

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain