Configuring Data Lineages
- Automatic lineage parsing: Lineages are automatically generated after the system parses the data processing and data migration nodes in data development jobs. No manual configuration is required. For details about the node types and scenarios that support automatic lineage parsing, see Automatic Lineage Parsing.
- Manual lineage configuration: Customize the input and output tables of lineages in data development job nodes. If you configure lineages manually for a node, the automatic lineage parsing does not take effect for this node. For details about the node types that support manual lineage configuration, see Manually Configuring a Lineage.
Constraints
Currently, field-level lineage parsing is not supported.
Automatic Lineage Parsing
Automatic lineage parsing does not require manual configuration. When a data development job contains the nodes and scenarios listed in Table 1, the system can automatically parse lineages.
The lineage of an SQL node can be parsed using multiple SQL statements, and column-level lineage parsing is supported. A single SQL statement cannot contain semicolons (;).
Job Node |
Supported Scenario |
---|---|
|
|
Lineages between DWS tables generated by DML operations such as "Insert into" |
|
Lineages between MRS tables generated by DML operations such as "Insert into/overwrite" |
|
Lineages between MRS tables generated by DML operations such as "Insert into/overwrite" |
|
Lineages generated during table file migration between MRS Hive, DLI, RDS, CSS, DWS, and OBS |
|
Data lineages generated by ETL tasks between DLI, OBS, MySQL, and DWS. |
Manually Configuring a Lineage
In a DataArts Studio data development job, you can customize the input and output tables of lineages on the nodes of the job. If you configure lineages manually for a node, the automatic lineage parsing does not take effect for this node.
When manually configuring the lineage, configure the input and output tables of the lineage on the Lineage tab page of the node. The data sources of the input and output tables can be DLI, DWS, Hive, CSS, OBS and CUSTOM. CUSTOM indicates a custom type. When manually configuring a lineage, you can add data sources that are not supported as custom types.
For example, you need to manually configure a lineage for an MRS Spark node in a pipeline data development job because this node does not support automatic lineage parsing. The procedure is as follows:
- Log in to the DataArts Studio console by following the instructions in Accessing the DataArts Studio Instance Console.
- On the DataArts Studio console, locate a workspace and click DataArts Factory.
- On the DataArts Factory console, choose Data Development > Develop Job. Double-click the name of the job for which you want to configure a lineage to open the job canvas.
- Click the MRS Spark node in the job canvas and then the lineageInfo page.
Figure 2 lineageInfo page
- Configure the lineage input table. For example, you can configure input table hive, as shown in Figure 3.
- Click OK and configure the lineage output table. For example, you can configure output table a, as shown in Figure 4.
- Click OK. The lineage for the MRS Spark node has been configured. If you want to view the lineage later, collect metadata by referring to Viewing Data Lineages and schedule the job. Then, you can view the manually configured lineage of the MRS Spark node in DataArts Catalog.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot