Setting Up Scheduling for a Job
This section describes how to set up scheduling for an orchestrated job.
- If the processing mode of a job is batch processing, configure scheduling types for jobs. Three scheduling types are supported: run once, run periodically, and event-based. For details, see Setting Up Scheduling for a Job Using the Batch Processing Mode.
- If the processing mode of a job is real-time processing, configure scheduling types for nodes. Three scheduling types are supported: run once, run periodically, and event-based. For details, see Setting Up Scheduling for Nodes of a Job Using the Real-Time Processing Mode.
Prerequisites
- You have performed the operations in Developing a Pipeline Job or Developing a Batch Processing Single-Task SQL Job.
- You have locked the job. Otherwise, you must click Lock so that you can develop the job. A job you create or import is locked by you by default. For details, see the lock function.
Constraints
- Set an appropriate value for this parameter. A maximum of five instances can be concurrently executed in a job. If the start time of a job instance is later than the configured job execution time, the job instances in the subsequent batch will be queued. As a result, the job execution costs a longer time than expected. For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table.
- If you use DataArts Studio DataArts Factory to schedule a CDM migration job and configure a scheduled task for the job in DataArts Migration, both configurations take effect. To ensure unified service logic and avoid scheduling conflicts, enable job scheduling in DataArts Factory and do not configure a scheduled task for the job in DataArts Migration.
Setting Up Scheduling for a Job Using the Batch Processing Mode
Three scheduling types are available: Run once, Run periodically, and Event-based. The procedure is as follows:
Click the Scheduling Setup tab on the right of the canvas to expand the configuration page and configure the scheduling parameters listed in Table 1.
Parameter |
Description |
---|---|
Scheduling Type |
Scheduling type of the job. Available options include:
|
Enable Dry Run |
If you select this option, the job will not be executed, and a success message will be returned. |
Task Groups |
Select a configured task group. For details, see Configuring Task Groups. Do not select is selected by default. After a task group is configured, you can control the number of concurrent nodes in the current workspace in a fine-grained manner. For example, if a job contains multiple nodes or patch data, you can control the number of concurrent nodes in the current workspace. Example 1: The maximum number of concurrent tasks in the task group is set to 2, and a job has five nodes. When the job runs, only two nodes are running and the other nodes are waiting to run. Example 2: The maximum number of concurrent tasks in the task group is set to 2, and the number of concurrent periods for a PatchData job is set to 5. When the PatchData job runs, two PatchData job instances are running, and the other job instances are waiting to run. The waiting instances can be delivered normally after a period of time. |
Parameter |
Description |
---|---|
From and to |
The period during which a scheduling task takes effect. You can set it to today or tomorrow by clicking the time box and then Today or Tomorrow. |
Recurrence |
The frequency at which the scheduling task is executed, which can be: Set an appropriate value for this parameter. A maximum of five instances can be concurrently executed in a job. If the start time of a job instance is later than the configured job execution time, the job instances in the subsequent batch will be queued. As a result, the job execution costs a longer time than expected. For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table. You can modify the scheduling period of a running job.
|
Scheduling Calendar |
Select a scheduling calendar. The default value is Do not use. For details about how to configure a scheduling calendar, see Configuring a Scheduling Calendar.
|
OBS Listening |
If you enable this function, the system automatically listens to the OBS path for new job files. If you disable this function, the system no longer listens to the OBS path. Configure the following parameters:
|
Dependency job |
You can select jobs that are executed periodically in different workspaces as dependency jobs. The current job starts only after the dependency jobs are executed. You can click Parse Dependency to automatically identify job dependencies.
NOTE:
For details about job dependency rules across workspaces, see Job Dependency Rule. Currently, DataArts Factory supports two types of job dependency policies, that is, dependency between jobs whose scheduling periods are traditional periods and dependency between jobs whose scheduling periods are natural periods. You can select either of them. The scheduling periods for new DataArts Studio instances are natural periods.
Figure 1 Dependency between jobs whose scheduling periods are traditional periods
Figure 2 Dependency between jobs whose scheduling periods are natural periods For details about the conditions for setting dependency jobs and how jobs run after dependency jobs are set, see Dependency Policies for Periodic Scheduling. |
Policy for Current job If Dependency job Fails |
Policy for processing the current job when one or more instances of its dependency job fail to be executed in its period.
For example, the recurrence of the current job is 1 hour and that of its dependency jobs is 5 minutes.
|
Run After Dependency job Ends |
If a job depends on other jobs, the job is executed only after its dependency job instances are executed within a specified time range. If the dependency job instances are not successfully executed, the current job is in waiting state. If you select this option, the system checks whether all job instances in the previous cycle have been executed before executing the current job. |
When configuring job dependencies, you can filter dependent jobs based on whether they are being scheduled. |
When configuring job dependencies, you can filter dependent jobs based on whether they are being scheduled. This prevents downstream job failures caused by upstream dependent jobs not being scheduled.
|
Cross-Cycle Dependency |
Dependency between job instances
|
Clear Waiting Instances |
Parameter |
Description |
---|---|
Event Type |
Type of the event that triggers job running
|
Parameters for DIS event-triggered jobs |
|
DIS Stream |
Name of the DIS stream. When a new message is sent to the specified DIS stream, DataArts Factory transfers the new message to the job to trigger the job running. |
Concurrent Events |
Number of jobs that can be concurrently processed. The maximum number of concurrent events is 128. |
Event Detection Interval |
Interval at which the system detects the DIS stream for new messages. The unit of the interval can be Seconds or Minutes. |
Access Policy |
Select the location where data is to be accessed:
|
Failure Policy |
Select a policy to be performed after scheduling fails.
|
Parameters for KAFKA event-triggered jobs |
|
Connection Name |
Before selecting a data connection, ensure that a Kafka data connection has been created in the Management Center. |
Topic |
Topic of the message to be sent to the Kafka. |
Concurrent Events |
Number of jobs that can be concurrently processed. The maximum number of concurrent events is 128. |
Event Detection Interval |
Interval at which the system detects the stream for new messages. The unit of the interval can be Seconds or Minutes. |
Access Policy |
Select the location where data is to be accessed:
|
Failure Policy |
Select a policy to be performed after scheduling fails.
|
Setting Up Scheduling for Nodes of a Job Using the Real-Time Processing Mode
Three scheduling types are available: Run once, Run periodically, and Event-based. The procedure is as follows:
Select a node. On the node development page, click the Scheduling Parameter Setup tab. On the displayed page, configure the parameters listed in Table 4.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot