Updated on 2024-11-12 GMT+08:00

Scheduling Jobs Across Workspaces

Scenario

If you have assigned permissions based on workspaces, users in different workspaces can only perform operations on jobs in their own workspaces. However, if jobs in different workspaces depend on each other, you can schedule the jobs across workspaces by following the instructions in this section.

Solution

The DataArts Studio DataArts Factory module supports job running triggered by events. Therefore, DIS or MRS Kafka can be used as the job dependency to implement cross-workspace job scheduling.

As shown in the following figure, after job1 in workspace A is complete, you can use DIS Client or Kafka Client to send a message to trigger job_agent for which event-driven scheduling has been configured. After job_agent is triggered by the message sent by DIS Client or Kafka Client, job_agent checks whether the message meets the expectation. If yes, job2 is triggered. If no, job2 is not triggered.

Figure 1 Scheduling solution

Prerequisites

Either of the following conditions must be met:
  • A DIS stream is available.
  • The MRS Kafka component is available, and MRS Kafka connections have been created in the Management Center of workspaces A and B.

Configuration Method (DIS Client)

  1. Log in to the DataArts Studio console, locate the target DataArts Studio instance, and click Access on the instance card.
  2. Locate the row that contains a workspace and click DataArts Factory in the Quick Entry column. On the displayed page, create a job named job1. Drag a Dummy node and a DIS Client node and drop them on the canvas, and click and hold to connect the nodes, as shown in Figure 2.

    • The Dummy node does not perform any operation. In this example, the Dummy node is only used for demonstration. You can replace it with another node.
    • The DIS Client node is used to send messages. You need to select the DIS region and stream, and set Sent Content to the EL expression job1,#{DateUtil.getDay(Job.startTime)}. After the job is executed, the DIS Client node sends a string message job1,Job execution date. For example, if job1 was executed on February 15, the message is job1,15.
    • Retain the default values of other job parameters.
    Figure 2 DIS Client node configuration for job1

  3. In workspace B, create a job named job_agent. Drag a Dummy node and a Subjob node and drop them on the canvas, and click and hold to connect the nodes, as shown in Figure 3.

    Figure 3 Scheduling settings for job_agent
    • The Dummy node does not perform any operation. In this example, the Dummy node is used to set the IF condition for the connection between the Dummy node and the Subjob node.
    • The Subjob node is used to reference and execute job2 as a subjob. In practice, you can reference an existing job or use another job node to replace the Subjob node.
    • Set Scheduling Type to Event-based, and set DIS Stream to the DIS stream selected for the DIS Client node of job1 in workspace A. The stream is used to trigger job execution through DIS messages.
    • Set the IF condition to check whether the message sent by the DIS Client node meets the expectation. If yes, the Subjob node will be executed. Otherwise, the Subjob node will be skipped.
      Right-click the connection line and select Set Condition. In the Edit Parameter Expression dialog box, enter the IF condition in the text box and retain the default failure policy. The IF condition is a ternary expression based on the EL expression syntax. The node following the connection line will be executed only if the result of the ternary expression is true. Otherwise, subsequent nodes will be skipped.
      #{StringUtil.equals(StringUtil.split(Job.eventData,',')[1],'21')}

      This IF condition indicates that subsequent job nodes are executed only if 21 (21st of each month) follows the comma in the message obtained from the DIS stream.

      If you want to match multiple messages, you can add multiple Dummy nodes, set the IF condition for the connection between each Dummy node and the Subjob node, and set Multi-IF Policy to OR in the configuration of DataArts Factory.

      Figure 4 Edit Parameter Expression

  4. Run the job_agent job when job1 in workspace A is not running. Then go to the Monitor Instance page and check whether the execution result meets the expectation.

    Because job1 is not running, no message is sent, and the Subjob node in the job_agent job is skipped, indicating that the IF condition takes effect.
    Figure 5 Subjob node skipped

  5. Start scheduling the job_agent job. Then run job1 in workspace A. After job1 is successfully executed, go to the Monitor Instance page of workspace B to check whether the job execution result meets the expectation.

    • The job_agent job is triggered.
    • If the current date matches the date in the IF condition, the Subjob node in the job_agent job is successfully executed, and job2 is also successfully executed. Otherwise, the Subjob node is skipped.
      Figure 6 Subjob node executed successfully

Configuration Method (Kafka Client)

  1. Log in to the DataArts Studio console, locate the target DataArts Studio instance, and click Access on the instance card.
  2. Locate the row that contains a workspace and click DataArts Factory in the Quick Entry column. On the displayed page, create a job named job1. Drag a Dummy node and a Kafka Client node and drop them on the canvas, and click and hold to connect the nodes, as shown in Figure 7.

    • The Dummy node does not perform any operation. In this example, the Dummy node is only used for demonstration. You can replace it with another node.
    • The Kafka Client node is used to send messages. You need to select a Kafka connection and a topic name, and set Sent Content to the EL expression job1,#{DateUtil.getDay(Job.startTime)}. After the job is executed, the Kafka Client node sends a string message job1,Job execution date. For example, if job1 was executed on February 15, the message is job1,15.
    • Retain the default values of other job parameters.
    Figure 7 Kafka Client node configuration for job1

  3. In workspace B, create a job named job_agent. Drag a Dummy node and a Subjob node and drop them on the canvas, and click and hold to connect the nodes, as shown in Figure 8.

    Figure 8 Scheduling settings for job_agent
    • The Dummy node does not perform any operation. In this example, the Dummy node is used to set the IF condition for the connection between the Dummy node and the Subjob node.
    • The Subjob node is used to reference and execute job2 as a subjob. In practice, you can reference an existing job or use another job node to replace the Subjob node.
    • Set Scheduling Type to Event-based, and set Connection Name and Topic to the Kafka connection and topic in workspace B, which correspond to the Kafka connection and topic selected for the Kafka Client node of job1 in workspace A. The connection and topic are used to trigger job execution through Kafka messages.
    • Set the IF condition to check whether the message sent by the Kafka Client node meets the expectation. If yes, the Subjob node will be executed. Otherwise, the Subjob node will be skipped.
      Right-click the connection line and select Set Condition. In the Edit Parameter Expression dialog box, enter the IF condition in the text box and retain the default failure policy. The IF condition is a ternary expression based on the EL expression syntax. The node following the connection line will be executed only if the result of the ternary expression is true. Otherwise, subsequent nodes will be skipped.
      #{StringUtil.equals(StringUtil.split(Job.eventData,',')[1],'21')}

      This IF condition indicates that subsequent job nodes are executed only if 21 (21st of each month) follows the comma in the message obtained from the Kafka stream.

      If you want to match multiple messages, you can add multiple Dummy nodes, set the IF condition for the connection between each Dummy node and the Subjob node, and set Multi-IF Policy to OR in the configuration of DataArts Factory.

      Figure 9 Edit Parameter Expression

  4. Run the job_agent job when job1 in workspace A is not running. Then go to the Monitor Instance page and check whether the execution result meets the expectation.

    Because job1 is not running, no message is sent, and the Subjob node in the job_agent job is skipped, indicating that the IF condition takes effect.
    Figure 10 Subjob node skipped

  5. Start scheduling the job_agent job. Then run job1 in workspace A. After job1 is successfully executed, go to the Monitor Instance page of workspace B to check whether the job execution result meets the expectation.

    • The job_agent job is triggered.
    • If the current date matches the date in the IF condition, the Subjob node in the job_agent job is successfully executed, and job2 is also successfully executed. Otherwise, the Subjob node is skipped.
      Figure 11 Subjob node executed successfully