Protocol Buffers
Protocol Buffers is a serialization format used for data exchange and data storage. Use-cases include batch/streaming processing and communication between multiple microservices in a platform-neutral way. Protocol Buffers focuses only on the ability to serialize and deserialize data as quick as possible and to make the data as small as possible to reduce the bandwidth required. Furthermore, Protocol Buffers, like AVRO, supports schema evolution. It uses a binary file for the schema definition. On the other hand, Protocol Buffers does not split the data like CSV and does not support data compression (unlike ORC, Parquet and AVRO).
Protocol Buffers was created by Google in 2008 as ProtoBuf. It is the most common serialisation format used by gRPC. Protocol Buffers initially supported only three languages: C++, Java and Python. Today, Protocol Buffers supports additional languages like Go, Ruby, JS, PHP, C# and Objective C.
- Learn more
- Wikipedia
Related articles
Comparison of different file formats in Big Data
Categories: Big Data, Data Engineering | Tags: Business intelligence, Data structures, Avro, HDFS, ORC, Parquet, Batch processing, Big Data, CSV, JavaScript Object Notation (JSON), Kubernetes, Protocol Buffers
In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposesā¦
By Aida NGOM
Jul 23, 2020
Data Lake ingestion best practices
Categories: Big Data, Data Engineering | Tags: NiFi, Data Governance, HDF, Operation, Avro, Hive, ORC, Spark, Data Lake, File Format, Protocol Buffers, Registry, Schema
Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customersā¦
By David WORMS
Jun 18, 2018