Apache Flink

Apache Flink is a open-source data processing framework and distributed processing engine for batch and stream dataflow. Originally named Stratosphere, the project evolved into Apache Flink in 2015 under the Apache Foundation.

Flink allows you to manage batch and stream processing through it's two main APIs:

  • the Dataset API
  • the DataStream API.

With his built-in fault-tolerant mechanism, Flink ensures high availability, delivering high throughput and low latency. Flink integrates with a wide range of storage systems and has built-in connectors for common data sources and sinks like Kafka as a source and a sink or [Elasticsearch] (https://www.adaltas.com/en/tag/elk-elasticsearch/) as a sink.

Also, Flink provides flexibility, operating either as a standalone cluster or integrating with common resource managers such as Hadoop YARN and Kubernetes

Related articles

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Categories: Data Engineering, Data Science | Tags: DevOps, Flink, Hadoop, HBase, Kafka, Spark, Internship, Kubernetes, Python

Context The exponential evolution of data has turned the industry upside down by redefining data storage, processing and data ingestion pipelines. Mastering these methods considerably facilitatesā€¦

David WORMS

By David WORMS

Nov 26, 2019

Apache Flink: past, present and future

Apache Flink: past, present and future

Categories: Data Engineering | Tags: Pipeline, Flink, Kubernetes, Machine Learning, SQL, Streaming

Apache Flink is a little gem which deserves a lot more attention. Letā€™s dive into Flinkā€™s past, its current state and the future it is heading to by following the keynotes and presentations at Flinkā€¦

CĆ©sar BEREZOWSKI

By CĆ©sar BEREZOWSKI

Nov 5, 2018

One week to discuss technology in a Moroccan riad

One week to discuss technology in a Moroccan riad

Categories: Adaltas Summit 2018, Learning | Tags: CDSW, Gatsby, React.js, Flink, Hadoop, Knox, Data Science, Deep Learning, Kubernetes, Node.js

Adaltas organise the year its first conference between the 22 and 26 of October. On the agenda of these 5 days of conference: discuss technology in one of the most beautiful riad of Marrakech. Mix theā€¦

David WORMS

By David WORMS

Oct 11, 2018

Deploying a secured Flink cluster on Kubernetes

Deploying a secured Flink cluster on Kubernetes

Categories: Big Data | Tags: Encryption, Flink, HDFS, Kafka, Elasticsearch, Kerberos, SSL/TLS

When deploying secured Flink applications inside Kubernetes, you are faced with two choices. Assuming your Kubernetes is secure, you may rely on the underlying platform or rely on Flink nativeā€¦

David WORMS

By David WORMS

Oct 8, 2018

Apache Beam: a unified programming model for data processing pipelines

Apache Beam: a unified programming model for data processing pipelines

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Apex, Beam, Pipeline, Flink, Spark

In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. At Dataworks Summit 2018 inā€¦

Gauthier LEONARD

By Gauthier LEONARD

May 24, 2018

Apache Apex with Apache SAMOA

Apache Apex with Apache SAMOA

Categories: Data Science, Events, Tech Radar | Tags: Apex, Samoa, Storm, Tools, Flink, Hadoop, Machine Learning

Traditional Machine Learning Batch Oriented Supervised - most common Training and Scoring One time model building Data set Training: Model building Holdout: Paremeter tuning Test: Accuracy Onlineā€¦

Pierre SAUVAGE

By Pierre SAUVAGE

Jul 17, 2016

Apache Apex: next gen Big Data analytics

Apache Apex: next gen Big Data analytics

Categories: Data Science, Events, Tech Radar | Tags: Apex, Storm, Tools, Flink, Hadoop, Kafka, Data Science, Machine Learning

Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallelā€¦

CĆ©sar BEREZOWSKI

By CĆ©sar BEREZOWSKI

Jul 17, 2016

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain