Updated on 2024-07-11 GMT+08:00

For Each

Functions

The For Each node specifies a subjob to be executed cyclically and assigns values to variables in a subjob with a dataset.

For details about how to use the For Each node, see Introduction to the For Each Operator.

When a For Each node is executed once, a specified subjob can be cyclically executed for a maximum of 1,000 times.

If DLI SQL is used as a frontend node, the For Each node supports a maximum of 100 subjobs.

Parameters

Table 1 describes the parameters of the For Each node.

Table 1 Parameters of the For Each node

Parameter

Mandatory

Description

Node Name

Yes

Name of a node. The name must contain 1 to 128 characters, including only letters, numbers, underscores (_), hyphens (-), slashes (/), less-than signs (<), and greater-than signs (>).

Subjob in a Loop

Yes

Name of the subjob to be executed cyclically.

Subjob Parameter Name

No

This parameter is available only when you set job parameters for a cyclic subjob. The parameter name is the variable defined in the subjob. Set the parameter value based on the following rules:
  • If the cyclic subjob needs to be read and replaced based on the variables of the parent job, set this parameter to an EL expression, for example, #{Loop.current[0]} or #{Loop.current[1]} which indicates obtaining the first or second value in the current row of the traversed dataset two-dimensional array. For details, see Loop Embedded Objects. After a job parameter name is configured for the cyclic subjob, the parameter value can be left empty.
  • If a cyclic subjob needs to use its own parameter variables, leave this parameter blank. In this case, set values for the parameters of the subjob.

Dataset

Yes

The For Each node needs to define a dataset. The dataset is a two-dimensional array used to cyclically replace variables in a subjob. A row of data in the dataset corresponds to a subjob instance. The dataset may come from the following sources:
  • Output from upstream nodes, such as the select statements of the Hive SQL, DLI SQL, or Spark SQL node, and echo of the shell node. The EL expression #{Job.getNodeOutput('preNodeName')} is used, which means the output of the previous node.
  • A specified array, for example, two-dimensional array [['001'],['002'],['003']]
    NOTE:
    • To transfer 00 and 01 as numbers, set this parameter to [["00"],["01"]];[[00],[01]];[['00'],['01']].
    • To transfer 00 and 01 as characters, add escape characters, for example, [["\"00\""],["\"01\""]];[['\'00\''],['\'01\'']].

Concurrent Subjobs

Yes

Subjobs generated cyclically can be executed concurrently. You can set the number of concurrent subjobs.

NOTE:

If a subjob contains a CDM Job node, set this parameter to 1.

Subjob Instance Name Suffix

No

Name of the subjob generated by For Each: For Each node name + underscore (_) + suffix.

The suffix is configurable. If the suffix is not configured, the suffix increases in ascending order based on the number.

Table 2 Advanced parameters

Parameter

Mandatory

Description

Max. Node Execution Duration

Yes

Execution timeout interval for the node. If retry is configured and the execution is not complete within the timeout interval, the node will be executed again.

Retry upon Failure

Yes

Whether to re-execute a node if it fails to be executed. Possible values:

  • Yes: The node will be re-executed, and the following parameters must be configured:
    • Retry upon Timeout
    • Maximum Retries
    • Retry Interval (seconds)
  • No: The node will not be re-executed. This is the default setting.
    NOTE:

    If retry is configured for a job node and the timeout duration is configured, the system allows you to retry a node when the node execution times out.

    If a node is not re-executed when it fails upon timeout, you can go to the Default Configuration page to modify this policy.

    Retry upon Timeout is displayed only when Retry upon Failure is set to Yes.

Policy for Handling Subsequent Nodes If the Current Node Fails

Yes

Operation that will be performed if the node fails to be executed. Possible values:

  • Suspend execution plans of the subsequent nodes: stops running subsequent nodes. The job instance status is Failed.
  • End the current job execution plan: stops running the current job. The job instance status is Failed.
  • Go to the next node: ignores the execution failure of the current node. The job instance status is Failure ignored.
  • Suspend the current job execution plan: If the current job instance is in abnormal state, the subsequent nodes of this node and the subsequent job instances that depend on the current job are in waiting state.

Enable Dry Run

No

If you select this option, the node will not be executed, and a success message will be returned.

Task Groups

No

Select a task group. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning.