Data lineage
Data lineage is the process of documenting and monitoring the origin, transformations and movements of data throughout its life cycle. It provides a comprehensive and transparent view that helps understand how data is collected, manipulated, transformed and used across different systems, processes and applications within an organization.
The visibility it offers ensure data quality, security and compliance. It allows us to answer questions such as: where does this data come from? How have they been modified or processed? Where are they stored? Who has access to it? This detailed understanding of the data journey is essential for making informed decisions, ensuring data governance, facilitating audits and ensuring compliance with regulations, such as the GDPR (General Data Protection Regulation) in the European Union, or other data privacy and security standards.
Related articles
Introduction to OpenLineage
Categories: Big Data, Data Governance, Infrastructure | Tags: Data Engineering, Infrastructure, Atlas, Data Lake, Data lakehouse, Data Warehouse, Data lineage
OpenLineage is an open-source specification for data lineage. The specification is complemented by Marquez, its reference implementation. Since its launch in late 2020, OpenLineage has been a presence…
Dec 19, 2023