Updated on 2022-09-23 GMT+08:00

Configuring Data Lineages

On the DataArts Studio platform, data lineages are generated by configuring data processing and migration nodes in the DataArts Factory module. Currently, the system collects the lineages generated by static node configuration and the lineages on some node instances. For details, see Automatic Lineage Analysis.

In addition, DataArts Studio allows you to manually configure lineages. If you do so, automatic lineage analysis does not take effect. For details, see Manually Configuring a Lineage.

Automatic Lineage Analysis

Data lineages can be parsed automatically if the job contains the following nodes:

  • SQL nodes

    DataArts Studio supports lineage parsing of DLI SQL, DWS SQL and MRS Hive SQL nodes. It supports multi-SQL parsing and column-level lineage parsing.

    • DLI SQL
      • Lineages generated by data insertion between DLI tables
      • Lineages between OBS files generated by table creation statements and DLI tables
    • DWS SQL
      • Lineages between DWS tables generated by DDL operations such as "Create table like/as"
      • Lineages between DWS tables generated by DML operations such as "Insert into"
    • MRS Hive SQL
      • Lineages between MRS tables generated by DDL operations such as "Create table like/as"
      • Lineages between MRS tables generated by DML operations such as "Insert into/overwrite"
  • Data integration nodes

    Lineages of the CDM Job, ETL Job, and OBS Manager nodes can be parsed.

    • CDM Job

      Lineages generated during table file migration between MRS Hive, DLI, RDS, CSS, DWS, and OBS

    • ETL Job

      Data lineages generated by ETL tasks between DLI, OBS, MySQL, and DWS.

    • OBS Manager

      Lineages generated by directory or file replication and migration between OBS buckets

A single SQL statement cannot contain semicolons (;).

Manually Configuring a Lineage

In DataArts Studio DataArts Factory, you can define the input and output lineage relationships of nodes. When you manually configure a lineage, automatic lineage analysis does not take effect. Manual lineage configuration does not affect job running.

Currently, DLI, DWS, Hive, CSS, OBS, and CUSTOM are supported as the input and output data sources during manual lineage configuration. CUSTOM indicates a custom type. When manually configuring a lineage, you can add data sources that are not supported as custom types.

The following nodes support manual lineage configuration: