Overview
What Is Data Lineage?
In the era of big data, various types of data are rapidly generated due to explosive data growth. The massive and complex data information is converged, transformed, and transferred to generate new data and aggregate into an ocean of data.
- Belongingness: Specific data belongs to a specific organization or individual.
- Multi-source: One piece of data can have multiple sources. One piece of data may be generated by processing multiple pieces of data, and there may be multiple such processes.
- Traceability: The data lineage is traceable. It reflects the data lifecycle and the entire process from data generation to data disappearance.
- Hierarchy: The data lineage is hierarchical. Data classification and summary form new data, and different levels of description result in data layers.
How DataArts Studio Data Lineage Is Implemented
- Generation of data lineages:
On the DataArts Studio platform, data lineages are generated by configuring data processing and migration nodes in the DataArts Factory module. Currently, the system collects the lineages generated by static node configuration and the lineages on some node instances. For details, see Automatic Lineage Analysis.
In addition, DataArts Studio allows you to manually configure lineages. If you do so, automatic lineage analysis does not take effect. For details, see Manually Configuring a Lineage.
- Display of data lineages:
If you have configured data lineages and started job scheduling in the DataArts Factory module, you can start a metadata collection task in the DataArts Catalog module to view the data lineages.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.