Configuring Data Lineages
On the DataArts Studio platform, data lineages are generated by configuring data processing and migration nodes in the DataArts Factory module. Currently, the system collects the lineages generated by static node configuration and the lineages on some node instances. For details, see Automatic Lineage Analysis.
In addition, DataArts Studio allows you to manually configure lineages. If you do so, automatic lineage analysis does not take effect. For details, see Manually Configuring a Lineage.
Automatic Lineage Analysis
Data lineages can be parsed automatically if the job contains the following nodes:
- SQL nodes
DataArts Studio supports lineage parsing of DLI SQL, DWS SQL and MRS Hive SQL nodes. It supports multi-SQL parsing and column-level lineage parsing.
- DLI SQL
- Lineages generated by data insertion between DLI tables
- Lineages between OBS files generated by table creation statements and DLI tables
- DWS SQL
- Lineages between DWS tables generated by DDL operations such as "Create table like/as"
- Lineages between DWS tables generated by DML operations such as "Insert into"
- MRS Hive SQL
- Lineages between MRS tables generated by DDL operations such as "Create table like/as"
- Lineages between MRS tables generated by DML operations such as "Insert into/overwrite"
- DLI SQL
- Data integration nodes
Lineages of the CDM Job, ETL Job, and OBS Manager nodes can be parsed.
- CDM Job
Lineages generated during table file migration between MRS Hive, DLI, RDS, CSS, DWS, and OBS
- ETL Job
Data lineages generated by ETL tasks between DLI, OBS, MySQL, and DWS.
- OBS Manager
Lineages generated by directory or file replication and migration between OBS buckets
- CDM Job
A single SQL statement cannot contain semicolons (;).
Manually Configuring a Lineage
In DataArts Studio DataArts Factory, you can define the input and output lineage relationships of nodes. When you manually configure a lineage, automatic lineage analysis does not take effect. Manual lineage configuration does not affect job running.
Currently, DLI, DWS, Hive, CSS, OBS, and CUSTOM are supported as the input and output data sources during manual lineage configuration. CUSTOM indicates a custom type. When manually configuring a lineage, you can add data sources that are not supported as custom types.
The following nodes support manual lineage configuration:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.