Setting Up Scheduling for a Job

This section describes how to set up scheduling for an orchestrated job.

If the processing mode of a job is batch processing, configure scheduling types for jobs. Three scheduling types are supported: run once, run periodically, and event-based. For details, see Setting Up Scheduling for a Job Using the Batch Processing Mode.
If the processing mode of a job is real-time processing, configure scheduling types for nodes. Three scheduling types are supported: run once, run periodically, and event-based. For details, see Setting Up Scheduling for Nodes of a Job Using the Real-Time Processing Mode.

Prerequisites

You have performed the operations in Developing a Pipeline Job or Developing a Batch Processing Single-Task SQL Job.
You have locked the job. Otherwise, you must click Lock so that you can develop the job. A job you create or import is locked by you by default. For details, see the lock function.

Constraints

Set an appropriate value for this parameter. A maximum of five instances can be concurrently executed in a job. If the start time of a job instance is later than the configured job execution time, the job instances in the subsequent batch will be queued. As a result, the job execution costs a longer time than expected. For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table.
If you use DataArts Studio DataArts Factory to schedule a CDM migration job and configure a scheduled task for the job in DataArts Migration, both configurations take effect. To ensure unified service logic and avoid scheduling conflicts, enable job scheduling in DataArts Factory and do not configure a scheduled task for the job in DataArts Migration.

Setting Up Scheduling for a Job Using the Batch Processing Mode

Three scheduling types are available: Run once, Run periodically, and Event-based. The procedure is as follows:

Click the Scheduling Setup tab on the right of the canvas to expand the configuration page and configure the scheduling parameters listed in Table 1.

**Table 1** Job scheduling parameters
Parameter	Description
Scheduling Type	Scheduling type of the job. Available options include: Run once: You need to manually execute the job. Run periodically: The job is executed periodically. For details about the parameters, see Table 2. Manual confirmation: If this option is selected, the job instance can be executed only after manual confirmation. If manual confirmation is not performed, the job instance cannot be executed. NOTE: In job instance execution scenarios, job instances are in waiting confirmation state on the Monitor Instance page. When you click Execute, the job instances are in waiting execution state. When you rerun instances, they are in waiting confirmation state. When you click Execute, the instances are in waiting execution state. In PatchData scenarios, PatchData job instances are in waiting confirmation state on the Monitor PatchData page. When you click Execute on the Monitor Instance page, PatchData job instances are in waiting execution state. In batch job monitoring scenarios, job instances are in waiting confirmation state on the Batch Jobs page. When you click Execute, the job instances are in waiting execution state. Event-based: The job will be executed when certain external conditions are met. For details about the parameters, see Table 3. For details, see Scheduling Jobs Across Workspaces.
Enable Dry Run	If you select this option, the job will not be executed, and a success message will be returned.
Task Groups	Select a configured task group. For details, see Configuring Task Groups. Do not select is selected by default. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning. Example 1: The maximum number of concurrent tasks in the task group is set to 2, and a job has five nodes. When the job runs, only two nodes are running and the other nodes are waiting. Example 2: The maximum number of concurrent tasks in the task group is set to 2, and the number of concurrent periods for a PatchData job is set to 5. When the PatchData job runs, two PatchData job instances are running, and the other job instances are waiting to run. The waiting instances can be delivered normally after a period of time. Example 3: If the same task group is configured for multiple jobs, and the maximum number of concurrent tasks in the task group is set to 2, only two jobs are running and the other jobs are waiting. If the same task group is configured for multiple job nodes, the maximum number of concurrent tasks in the task group is set to 2, and there are five job nodes in total, two nodes are running and the other nodes are waiting. NOTE: For a pipeline job, you can configure a task group for each node or for the job. A task group configured for a node is prior to one configured for the job.

**Table 2** Parameters for jobs that are executed periodically
Parameter	Description
From and to	The period during which a scheduling task takes effect. You can set it to today or tomorrow by clicking the time box and then Today or Tomorrow.
Recurrence	The frequency at which the scheduling task is executed, which can be: Set an appropriate value for this parameter. A maximum of five instances can be concurrently executed in a job. If the start time of a job instance is later than the configured job execution time, the job instances in the subsequent batch will be queued. As a result, the job execution costs a longer time than expected. For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table. You can modify the scheduling period of a running job. Minutes: The job starts at the top of the hour. The interval is accurate to minute. After the scheduling ends at the end time of the current day, the scheduling automatically starts on the next day. NOTE: If you select Minutes for Scheduling Frequency, the job cannot be scheduled based on the configured interval, that is, the job cannot be executed at a fixed frequency across hours. For example: A scheduling policy is configured at 14:20 on June 19, 2024. According to the policy, the scheduling starts at 00:30 and ends at 23:59, at an interval of 30 minutes. The job is actually scheduled at 14:30:00, 15:30:00, 16:30:00, 17:30:00, 18:30:00, and more on June 19, 2024. A scheduling policy is configured at 14:20 on June 19, 2024. According to the policy, the scheduling starts at 00:00 and ends at 23:59, at an interval of 50 minutes. The job is actually scheduled at 14:50:00, 15:00:00, 15:50:00, 16:00:00, 16:50:00, 17:00:00, 17:50:00, and more on June 19, 2024. Hours: You can select Interval Hour, indicating that the job starts at a specified time point and that the interval is accurate to hour. After the scheduling ends at the end time of the current day, the scheduling automatically starts on the next day. You can also select Discrete Hour and specify any hour in a day to schedule the job. Every day: The job starts at a specified time on a day. The scheduling period is one day. Every week: You can select a specified time point of one or more days in a week. Every month: You can select a specified time point of one or more days in a month. In addition, you can select Last day of each month. NOTE: DataArts Studio does not support concurrent running of PatchData instances and periodic job instances of underlying services (such as CDM and DLI). To prevent PatchData instances from affecting periodic job instances and avoid exceptions, ensure that they do not run at the same time.
Scheduling Calendar	Select a scheduling calendar. The default value is Do not use. For details about how to configure a scheduling calendar, see Configuring a Scheduling Calendar. The job is scheduled on the custom working days in the calendar. On non-working days, a dry run occurs. Examples: periodic job scheduling and PatchData tasks. Changes to the working days of the scheduling calendar do not take effect for the job instances that are being executed, but can take effect immediately for those that have not been generated.
OBS Listening	If you enable this function, the system automatically listens to the OBS path for new job files. If you disable this function, the system no longer listens to the OBS path. Configure the following parameters: OBS File: An EL expression is supported. Listening Interval: Set a value ranging from 1 to 60, in minutes. Timeout: Set a value ranging from 1 to 1440, in minutes.
Dependency job	You can select jobs that are executed periodically in different workspaces as dependency jobs. The current job starts only after the dependency jobs are executed. You can click Parse Dependency to automatically identify job dependencies. NOTE: For details about job dependency rules across workspaces, see Job Dependency Rule. Currently, DataArts Factory supports two types of job dependency policies, that is, dependency between jobs whose scheduling periods are traditional periods and dependency between jobs whose scheduling periods are natural periods. You can select either of them. The scheduling periods for new DataArts Studio instances are natural periods. Figure 1 Dependency between jobs whose scheduling periods are traditional periods Figure 2 Dependency between jobs whose scheduling periods are natural periods For details about the conditions for setting dependency jobs and how jobs run after dependency jobs are set, see Dependency Policies for Periodic Scheduling.
Policy for Current job If Dependency job Fails	Policy for processing the current job when one or more instances of its dependency job fail to be executed in its period. Pending Waits to execute the current job, which affects the execution of subsequent jobs. You can force the dependency job to be executed successfully. Continue Continues to execute the current job. Cancel Cancels the current job. Its status becomes Canceled. For example, the recurrence of the current job is 1 hour and that of its dependency jobs is 5 minutes. If the value of this parameter is set to Cancel, the current job will be canceled as long as one of the 12 instances of its dependency job fails. If the value of this parameter is set to Continue, the current job will be executed after the 12 instances of its dependency job are executed. NOTE: You can set this parameter for multiple jobs in a batch. For details, see Configuring Default Items. This parameter takes effect only for new jobs.
Run After Dependency job Ends	If a job depends on other jobs, the job is executed only after its dependency job instances are executed within a specified time range. If the dependency job instances are not successfully executed, the current job is in waiting state. If you select this option, the system checks whether all job instances in the previous cycle have been executed before executing the current job.
Dependency Job	When configuring job dependencies, you can filter dependent jobs based on whether they are being scheduled. This prevents downstream job failures caused by upstream dependent jobs not being scheduled. All jobs Running jobs
Dependency Cycle	Same Cycle NOTE: The Recent option is only applied to daily jobs that depend on hourly jobs or jobs scheduled by the minute, or hourly jobs that depend on jobs scheduled by the minute. By default, this option is not selected, and the existing job dependency rule is used. If you select this option, downstream jobs depend only on the upstream job that is closest to the downstream job's scheduled execution time. The Recent option will be unavailable if it is not supported. Previous N Cycle. N range is from 1 to 30.
Cross-Cycle Dependency	Dependency between job instances Independent on the previous schedule cycle: You can set Concurrency to set the number of job instances that are concurrently executed. If you set it to 1, a batch is executed only after the previous batch is executed (the execution is successful, cancelled, or failed). Self-dependent: The job can be rescheduled only after it is executed in the current schedule cycle. Before that, the job is in Waiting state. Skip waiting instances and run the latest instance: Skipped job instances will be canceled and not executed. If the execution of a job instance takes a long time, multiple subsequent job instances may be skipped. However, if these job instances need to be executed, skipping them may cause service logic errors. For example, if partitioned tables are required but redundant job instances are skipped, some partitioned tables may go missing. Exercise caution when selecting this option. NOTE: Skip waiting instances and run the latest instance is only supported for jobs scheduled by minute or hour. If the number of concurrent jobs is small and no instance has been generated, blocked instances will not be skipped. If a job with a shorter period depends on a job with a longer period, some instances may not be skipped and still be executed.
Clear Waiting Instances	No Yes If you select Yes, you need to set Days Overdue. Its value ranges from 1 to 180. If this parameter is not set, expired waiting job instances will be cleared based on the workspace-level configuration by default. You can choose whether to cancel waiting job instances based on the site requirements.
Enable Dry Run	If you select this option, the job will not be executed, and a success message will be returned.
Task Groups	Select a configured task group. For details, see Configuring Task Groups. Do not select is selected by default. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning. NOTE: For a pipeline job, you can configure a task group for each node or for the job. A task group configured for a node is prior to one configured for the job.

**Table 3** Parameters for event-based jobs
Parameter	Description
Event Type	Type of the event that triggers job running DIS KAFKA
Parameters for DIS event-triggered jobs
DIS Stream	Name of the DIS stream. When a new message is sent to the specified DIS stream, DataArts Factory transfers the new message to the job to trigger the job running.
Concurrent Events	Number of jobs that can be concurrently processed. The maximum number of concurrent events is 128.
Event Detection Interval	Interval at which the system detects the DIS stream for new messages. The unit of the interval can be Seconds or Minutes.
Access Policy	Select the location where data is to be accessed: Access from the last location: For the first access, data is accessed from the most recently recorded location. For the subsequent access, data is accessed from the previously recoded location. Access from a new location: Data is accessed from the most recently recorded location each time.
Failure Policy	Select a policy to be performed after scheduling fails. Suspend Ignore the failure and proceed with the next event
Enable Dry Run	If you select this option, the job will not be executed, and a success message will be returned.
Task Groups	Select a configured task group. For details, see Configuring Task Groups. Do not select is selected by default. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning. NOTE: For a pipeline job, you can configure a task group for each node or for the job. A task group configured for a node is prior to one configured for the job.
Parameters for KAFKA event-triggered jobs
Connection Name	Before selecting a data connection, ensure that a Kafka data connection has been created in the Management Center.
Topic	Topic of the message to be sent to the Kafka.
Concurrent Events	Number of jobs that can be concurrently processed. The maximum number of concurrent events is 128.
Event Detection Interval	Interval at which the system detects the stream for new messages. The unit of the interval can be Seconds or Minutes.
Access Policy	Select the location where data is to be accessed: Access from the last location: For the first access, data is accessed from the most recently recorded location. For the subsequent access, data is accessed from the previously recoded location. Access from a new location: Data is accessed from the most recently recorded location each time.
Failure Policy	Select a policy to be performed after scheduling fails. Suspend Ignore the failure and proceed with the next event
Enable Dry Run	If you select this option, the job will not be executed, and a success message will be returned.
Task Groups	Select a configured task group. For details, see Configuring Task Groups. Do not select is selected by default. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning. NOTE: For a pipeline job, you can configure a task group for each node or for the job. A task group configured for a node is prior to one configured for the job.
Enable Dry Run	If you select this option, the job will not be executed, and a success message will be returned.
Task Groups	Select a configured task group. For details, see Configuring Task Groups. Do not select is selected by default. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning. NOTE: For a pipeline job, you can configure a task group for each node or for the job. A task group configured for a node is prior to one configured for the job.

Setting Up Scheduling for Nodes of a Job Using the Real-Time Processing Mode

Three scheduling types are available: Run once, Run periodically, and Event-based. The procedure is as follows:

Select a node. On the node development page, click the Scheduling Parameter Setup tab. On the displayed page, configure the parameters listed in Table 4.

**Table 4** Parameters for setting up node scheduling
Parameter	Description
Scheduling Type	Scheduling type of the job. Available options include: Run once: You need to manually run the job. Run periodically: The job runs automatically and periodically. Event-based: The job runs when certain external conditions are met.
Parameters displayed when Scheduling Type is Run periodically
From and to	The period during which a scheduling task takes effect.
Recurrence	The frequency at which the scheduling task is executed, which can be: Minutes Hours Every day Every week Every month For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table. You can modify the scheduling period of a running job.
Cross-Cycle Dependency	Dependency between job instances Independent on the previous schedule cycle Set Concurrency. Number of job instances that are concurrently executed. If you set it to 1, a batch is executed only after the previous batch is executed (the execution is successful, cancelled, or failed). Self-dependent: The job can be rescheduled only after it is executed in the current schedule cycle. Before that, the job is in Waiting state.
Parameters displayed when Scheduling Type is Event-based
Event Type	Type of the event that triggers job running
DIS Stream	Name of the DIS stream. When a new message is sent to the specified DIS stream, DataArts Factory transfers the new message to the job to trigger the job running. This parameter is mandatory only when Event Type is set to DIS.
Connection Name	Before selecting a data connection, ensure that a Kafka data connection has been created in the Management Center. This parameter is mandatory only when Event Type is set to KAFKA.
Topic	Topic of the message to be sent to the Kafka. This parameter is mandatory only when Event Type is set to KAFKA.
Consumer Group	A scalable and fault-tolerant group of consumers in Kafka. Consumers in a group share the same ID. They collaborate with each other to consume all partitions of subscribed topics. A partition in a topic can be consumed by only one consumer. NOTE: A consumer group can contain multiple consumers. The group ID is a string that uniquely identifies a consumer group in a Kafka cluster. Each partition of each topic subscribed to by a consumer group can be consumed by only one consumer. Consumer groups do not affect each other. If you select DIS or KAFKA for Event Type, the consumer group ID is automatically displayed. You can also manually change the consumer group ID.
Concurrent Events	Number of jobs that can be concurrently processed. The maximum number of concurrent events is 10.
Event Detection Interval	Interval at which the system detects the DIS stream for new messages. The unit of the interval can be Seconds or Minutes.
Access Policy	Access from the last location Access from a new location This parameter is mandatory only when Event Type is set to KAFKA.
Failure Policy	Select a policy to be performed after scheduling fails. Suspend Ignore failure and proceed