Apache Airflow
Apache Airflow is an open source workflow management platform created by AirBnB engineers in 2014. Users can create, schedule and monitor complex workflows programatically while a rich user interface provides powerful visualization tools.
Workflows are authored as DAGs (directed acyclic graphs) in Python scripts implementing the "configuration as code" principle. This approach enables rapid iteration on data pipelines and a high degree of scalability.
Airflow was accepted as Apache Incubator project in March 2016 and has been an Apache top-level project since January 2019. It has established itself as de-facto standard in workflow management and is used by data engineers around the globe.
- Learn more
- Official website
Related articles
Automate a Spark routine workflow from GitLab to GCP
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Learning and tutorial, Airflow, Spark, CI/CD, GitLab, GitOps, GCP, Terraform
A workflow consists in automating a succession of tasks to be carried out without human intervention. It is an important and widespread concept which particularly apply to operational environmentsā¦
Jun 16, 2020
Introducing Apache Airflow on AWS
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: PySpark, Learning and tutorial, Airflow, Oozie, Spark, AWS, Docker, Python
Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-sourceā¦
May 5, 2020
Get in control of your workflows with Apache Airflow
Categories: Big Data, Tech Radar | Tags: DevOps, Airflow, Cloud, Python
Below is a compilation of my notes taken during the presentation of Apache Airflow by Christian Trebing from BlueYonder. Introduction Use case: how to handle data coming in regularly from customersā¦
Jul 17, 2016