Batch processing

Batch processing is a data processing term describing the non-interactive, automatic, sequential, and complete processing of one or more input file(s). Common examples for batch processing include:

  • Accounting: Book incoming payments of a working day, leading to new account balances
  • Data migration: Convert a number of files from one format into another
  • Retail: Generate aggregate statistics from all sales in the current month

The results of batch processing are often batches themselves, for example lists of receipts, reports or changed data sets.

Learn more
Wikipedia

Related articles

Storage size and generation time in popular file formats

Storage size and generation time in popular file formats

Categories: Data Engineering, Data Science | Tags: Avro, HDFS, Hive, ORC, Parquet, Big Data, Data Lake, File Format, JavaScript Object Notation (JSON)

Choosing an appropriate file format is essential, whether your data transits on the wire or is stored at rest. Each file format comes with its own advantages and disadvantages. We covered them in aā€¦

Barthelemy NGOM

By Barthelemy NGOM

Mar 22, 2021

Comparison of different file formats in Big Data

Comparison of different file formats in Big Data

Categories: Big Data, Data Engineering | Tags: Business intelligence, Data structures, Avro, HDFS, ORC, Parquet, Batch processing, Big Data, CSV, JavaScript Object Notation (JSON), Kubernetes, Protocol Buffers

In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposesā€¦

Aida NGOM

By Aida NGOM

Jul 23, 2020

Apache Flink: past, present and future

Apache Flink: past, present and future

Categories: Data Engineering | Tags: Flink, Pipeline, Kubernetes, Machine Learning, SQL, Streaming

Apache Flink is a little gem which deserves a lot more attention. Letā€™s dive into Flinkā€™s past, its current state and the future it is heading to by following the keynotes and presentations at Flinkā€¦

CĆ©sar BEREZOWSKI

By CĆ©sar BEREZOWSKI

Nov 5, 2018

Apache Beam: a unified programming model for data processing pipelines

Apache Beam: a unified programming model for data processing pipelines

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Apex, Beam, Flink, Pipeline, Spark

In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. At Dataworks Summit 2018 inā€¦

Gauthier LEONARD

By Gauthier LEONARD

May 24, 2018

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain