Developing a Pipeline Job

This section describes how to develop and configure a job. A job cannot be larger than 1 MB.

For details about how to develop a batch processing job or real-time processing job in pipeline mode, see Compiling Job Nodes, Configuring Basic Job Information, Configuring Job Parameters, and Testing and Saving the Job.

Prerequisites

A job has been created. For details, see Creating a Job.
You have locked the job. Otherwise, you must click Lock so that you can develop the job. A job you create or import is locked by you by default. For details, see the lock function.

Compiling Job Nodes

This part applies to batch processing jobs and real-time processing jobs in pipeline mode.

Log in to the DataArts Studio console by following the instructions in Accessing the DataArts Studio Instance Console.
On the DataArts Studio console, locate a workspace and click DataArts Factory.
In the left navigation pane of DataArts Factory, choose Development > Develop Job.
In the job directory, double-click the name of a batch processing job or real-time processing job in pipeline mode to access the job development page.
Drag a desired node to the canvas, move the mouse over the node, and select the icon and drag it to connect to another node.

It is recommended that each job contain a maximum of 200 nodes.

Figure 1 Compiling a job

Configure node functions. Right-click a node icon on the canvas and select a function as needed. Table 1 lists the available functions.

**Table 1** Node functions
Function	Description
Configure	Goes to the Node Property page of the node.
Delete	Deletes one or more nodes at the same time. Deleting one node: Right-click the node icon in the canvas and choose Delete or press the Delete shortcut key. Deleting multiple nodes: Click the icons of the nodes to be deleted in the canvas while holding on Ctrl, right-click them, and select Delete or press the Delete shortcut key.
Copy	Copies one or more nodes to any job. Single-node copy: You can either right-click the node icon in the canvas, choose Copy, and paste the node to a target location, or click the node icon in the canvas and press Ctrl+C and Ctrl+V to paste the node to a target location. The copied node carries the configuration information of the original node. Multi-node copy: Click the icons of the nodes to be copied in the canvas while holding on Ctrl. Then you can either right-click them, select Copy, and paste the nodes to the target location, or press Ctrl+C and Ctrl+V to paste the nodes to the target location. The copied node carries the configuration information of the original node, but does not contain the connection relationship between nodes.
Test Run	Runs the node for a test. NOTE: You can view the test run logs of the job node by clicking View Log.
Test from Current Node	This option is available only for batch processing jobs. It tests the current and subsequent nodes.
Add/Delete Connection	Adds or deletes a connection between two nodes.
Edit CDM Job	This option is available only for CDM jobs. After selecting a CDM cluster and a job, you can go to the CDM job editing page to modify the job.
View Job Log	This option is available only for CDM jobs. When a CDM job is running, you can right-click the CDM job node and select View Job Log from the shortcut menu to go to the job monitoring page and view logs to help developers demarcate and locate job running exceptions.
Edit Script	This option is available only for the node associated with a script. Goes to the script editing page and edits the associated script.
Add Note	Adds a note to the node. Each node can have multiple notes. Creating, displaying, or hiding a note on a job node takes effect only for this node. Creating, displaying, or hiding a note on the top of the canvas takes effect for the entire job.

(Optional) Configure line functions. Right-click the line connecting two nodes on the canvas. Delete and Set Condition are displayed. You can select them as needed.
- Delete: Deletes the line connecting the nodes.
- Set Condition: In the displayed dialog box, you can enter a ternary expression using the EL expression syntax. If the result of the ternary expression is true, subsequent nodes will be connected. Otherwise, subsequent nodes will be skipped.
  The following figure shows a typical ternary expression. If the execution result of the DQM node is true, subsequent nodes will be connected. If the execution result is false and the Failure Policy is Skip all subsequent nodes, the next node A and all nodes following node A will be skipped.
```
#{(Job.getNodeStatus("DQM")) == "success" ? "true" : "false"}
```
  Figure 2 Set Condition
  
  For details about the EL expression syntax, see Expression Overview. For details about how to use IF conditions, see IF Condition Judgment.
Configure node properties Click a node in the canvas. On the displayed Node Properties page, configure node properties. For details, see Node Overview.

Configuring Basic Job Information

After you configure the owner and priority for a job, you can search for the job by the owner and priority. The procedure is as follows:

Click the Basic Info tab on the right of the canvas to expand the configuration page and configure job parameters, as listed in Table 2.

**Table 2** Basic job information
Parameter	Description
Owner	An owner configured during job creation is automatically matched. This parameter value can be modified.
Job Agency	This parameter is available when Scheduling Identities is set to Yes. After an agency is configured, the job interacts with other services as an agency during job execution.
Priority	Priority configured during job creation is automatically matched. This parameter value can be modified.
Execution Timeout	Timeout of the job instance. If this parameter is set to 0 or is not set, this parameter does not take effect. If the notification function is enabled for the job and the execution time of the job instance exceeds the preset value, the system sends a specified notification, and the job keeps running.
Exclude Waiting Time from Instance Timeout Duration	Whether to exclude the wait time from the instance execution timeout duration If you select this option, the time to wait before an instance starts running is excluded from the timeout duration. You can modify this setting in Default Configuration > Exclude Waiting Time from Instance Timeout Duration. If you do not select this option, the time to wait before an instance starts running is included in the timeout duration.
Custom Parameter	Set the name and value of the parameter.
Job Tag	Configure job tags to manage jobs by category. Click Add to add a tag to the job. You can also select a tag configured in Managing Job Tags.

Configuring Job Parameters

Job parameters can be globally used in any node in jobs. The procedure is as follows:

For batch and real-time processing jobs in pipeline mode: Click the blank area in the canvas and then the Parameter Setup tab on the right, and configure the parameters listed in Table 3.

**Table 3** Job parameter setup
Function	Description
Variables
Add	Click Add and enter the variable parameter name and parameter value in the text boxes. Parameter Name Only letters, numbers, periods (.), hyphens, and underscores (_) are allowed. Parameter Value The string type of parameter value is a character string, for example, str1. The numeric type of parameter value is a number or operation expression. After the parameter is configured, it is referenced in the format of ${Parameter name} in the job. NOTE: If a job has two nodes, the first Rest Client node returns a body, and the second node uses the returned data. If the data contains more than 1,000,000 characters, it will be truncated. When configuring job parameters, ensure that the value of a job parameter contains no more than 1,000,000 characters.
Edit Parameter Expression	Click next to the parameter value text box. In the displayed dialog box, edit the parameter expression. For more expressions, see Expression Overview.
Modify	Change the parameter name or value in the corresponding text boxes.
Mask	If the parameter value is a key, click to mask the value for security purposes.
Delete	Click next to the parameter name and value text boxes to delete the job parameter.
Constant Parameter
Add	Click Add and enter the constant parameter name and parameter value in the text boxes. Parameter name Only letters, numbers, hyphens, and underscores (_) are allowed. Parameter value The string type of parameter value is a character string, for example, str1. The numeric type of parameter value is a number or operation expression. After the parameter is configured, it is referenced in the format of ${Parameter name} in the job.
Edit Parameter Expression	Click next to the parameter value text box. In the displayed dialog box, edit the parameter expression. For more expressions, see Expression Overview.
Modify	Modify the parameter name and parameter value in text boxes and save the modifications.
Delete	Click next to the parameter name and value text boxes to delete the job parameter.
Workspace Environment Variables
View the variables and constants that have been configured in the workspace.

Click the Parameter Preview tab and configure the parameters listed in Table 4.

The script parameters of the following types of operators can be previewed: MRS Flink Job, DLI Flink Job, DLI SQL, DWS SQL, MRS HetuEngine, MRS ClickHouse SQL, MRS Hive SQL, MRS Impala SQL, MRS Presto SQL, RDS SQL, DORIS SQL, and MRS Spark SQL.

**Table 4** Job parameter preview
Function	Description
Current Time	This parameter is displayed only when Scheduling Type is set to Run once. The default value is the current time.
Event Triggering Time	This parameter is displayed only when Scheduling Type is set to Event-based. The default value is the time when an event is triggered.
Scheduling Period	This parameter is displayed only when Scheduling Type is set to Run periodically. The default value is the scheduling period.
Start Time	This parameter is displayed only when Scheduling Type is set to Run periodically. The value is the configured job execution time.
Start Time	This parameter is displayed only when Scheduling Type is set to Run periodically. The value is the time when the periodic job scheduling starts.
Subsequent Instances	Number of job instances scheduled. The default value is 1 when Scheduling Type is set to Run once. The default value is 1 when Scheduling Type is set to Event-based. When Scheduling Type is set to Run periodically: If the number of instances exceeds 10, a maximum of 10 instances can be displayed, and the system displays message "A maximum of 10 instances are supported."

In Parameter Preview, if a job parameter has a syntax error, the system displays a message.

If a parameter depends on the data generated during job execution, such data cannot be simulated and displayed in Parameter Preview.

Testing and Saving the Job

After a job is configured, complete the following operations:

Batch processing job

Click Test above the canvas. In the displayed dialog box, the job variables are displayed. Click OK to test the job. If the test fails, view the logs of the job node and locate and rectify the fault.
- You can view the test run logs of the job by clicking View Log.
- If you test the job before submitting a version, the version of the generated job instance is 0 on the Job Monitoring page.
- You can control access to the test run logs. For example, after user A performs a test, user A can view the test run logs on the Monitor Instance page, but user B cannot.
When the test is successful, click Save to save the job configuration.

After the job is saved, a version is automatically generated and displayed in Versions. The version can be rolled back. If you save a job multiple times within a minute, only one version is recorded. If the intermediate data is important, you can click Save new version to save and add a version.

Processing jobs in real time

Click Save to save the job configuration.

After the job is saved, a version is automatically generated and displayed in Versions. The version can be rolled back. If you save a job multiple times within a minute, only one version is recorded. If the intermediate data is important, you can click Save new version to save and add a version.
After submitting the job version, click Start above the canvas to run the job. After the job is executed, go to the Job Monitoring page to view the job execution result.