Setting Up Scheduling for a Job

This section describes how to set up scheduling for an orchestrated job.

If the processing mode of a job is batch processing, configure scheduling types for jobs. Three scheduling types are supported: run once, run periodically, and event-based. For details, see Setting Up Scheduling for a Job Using the Batch Processing Mode.
If the processing mode of a job is real-time processing, configure scheduling types for nodes. Three scheduling types are supported: run once, run periodically, and event-based. For details, see Setting Up Scheduling for Nodes of a Job Using the Real-Time Processing Mode.

Prerequisites

You have developed a job by following the instructions in Developing a Job.
You have locked the job. Otherwise, you must click Lock so that you can develop the job. A job you create or import is locked by you by default. For details, see the lock function.

Constraints

Set an appropriate value for this parameter. A maximum of five instances can be concurrently executed in a job. If the start time of a job instance is later than the configured job execution time, the job instances in the subsequent batch will be queued. As a result, the job execution costs a longer time than expected. For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table.
If you use DataArts Studio DataArts Factory to schedule a CDM migration job and configure a scheduled task for the job in DataArts Migration, both configurations take effect. To ensure unified service logic and avoid scheduling conflicts, enable job scheduling in DataArts Factory and do not configure a scheduled task for the job in DataArts Migration.

Setting Up Scheduling for a Job Using the Batch Processing Mode

Three scheduling types are available: Run once, Run periodically, and Event-based. The procedure is as follows:

Click the Scheduling Setup tab on the right of the canvas to expand the configuration page and configure the scheduling parameters listed in Table 1.

**Table 1** Job scheduling parameters
Parameter	Description
Scheduling Type	Scheduling type of the job. Available options include: Run once: You need to manually execute the job. Run periodically: The job is executed periodically. For details about the parameters, see Table 2. Event-based: The job will be executed when certain external conditions are met. For details about the parameters, see Table 3.
Dry run	If you select this option, the job will not be executed, and a success message will be returned.

**Table 2** Parameters for jobs that are executed periodically
Parameter	Description
From and to	The period during which a scheduling task takes effect.
Recurrence	The frequency at which the scheduling task is executed, which can be: Set an appropriate value for this parameter. A maximum of five instances can be concurrently executed in a job. If the start time of a job instance is later than the configured job execution time, the job instances in the subsequent batch will be queued. As a result, the job execution costs a longer time than expected. For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table. Minutes: The job starts at the top of the hour. The interval is accurate to minute. After the scheduling ends at the end time of the current day, the scheduling automatically starts on the next day. Hours: The job starts at a specified time point. The interval is accurate to hour. After the scheduling ends at the end time of the current day, the scheduling automatically starts on the next day. Every day: The job starts at a specified time on a day. The scheduling period is one day. Every week: You can select a specified time point of one or more days in a week. Every month: You can select a specified time point of one or more days in a month.
Dependency job	If you select a dependency job that is executed periodically, the current job will be executed only when an instance of the dependency job is executed within a certain period of time. You can only search for jobs by name. For details about the conditions of dependency jobs and how a job runs after its dependency jobs are set, see Job Dependency. If you select multiple dependency jobs, you can execute the current job only after all dependency job instances are executed within a specified time range (see How a Job Runs After a Dependency Job Is Set for It for details.). The constraints are as follows: The recurrence of job A cannot be shorter than that of job B. For example, if both job A and job B are scheduled by minute or hour and the interval of job A is shorter than that of job B, then job B cannot be set as the dependency job of job A. If job A is scheduled by minute and job B is scheduled by hour, job B cannot be set as the dependency job of job A. The recurrence of neither job A nor job B can be week. For example, if the recurrence of job A or job B is week, job B cannot be set as the dependency job of job A. A job whose recurrence is month can depend only on a job whose recurrence is day. For example, if the recurrence of job A is month, job B can be set as the dependency job of job A only if job B's recurrence is day.
Policy for Current job If Dependency job Fails	Policy for processing the current job when one or more instances of its dependency job fail to be executed in its period. Suspend Suspends the current job. The suspended job will block the execution of subsequent jobs. You can force the dependency job to be executed successfully. Continue Continues to execute the current job. Terminate Stops executing the current job. Its status becomes Canceled. For example, the recurrence of the current job is 1 hour and that of its dependency jobs is 5 minutes. If the value of this parameter is set to Terminate, the current job will be terminated as long as one of the 12 instances of its dependency job fails. If the value of this parameter is set to Continue, the current job will be executed after the 12 instances of its dependency job are executed. NOTE: You can set this parameter for multiple jobs in a batch. For details, see Configuring a Default Item.
Run After Dependency job Ends	If a job depends on other jobs, the job is executed only after its dependency job instances are executed within a specified time range (see How a Job Runs After a Dependency Job Is Set for It for details). If the dependency job instances are not successfully executed, the current job is in waiting state. If you select this option, the system checks whether all job instances in the previous cycle have been executed before executing the current job.
Cross-Cycle Dependency	Dependency between job instances Independent on the previous schedule cycle: You can set Concurrency to set the number of job instances that are concurrently executed. If you set it to 1, a batch is executed only after the previous batch is executed (the execution is successful, cancelled, or failed). Self-dependent (The current job can continue to run only after the previous schedule cycle is successfully finished.)

**Table 3** Parameters for event-based jobs
Parameter	Description
Event Type	Type of the event that triggers job running KAFKA
Parameters for KAFKA event-triggered jobs
Connection Name	Before selecting a data connection, ensure that a Kafka data connection has been created in the Management Center.
Topic	Topic of the message to be sent to the Kafka.
Concurrent Events	Number of jobs that can be concurrently processed. The maximum number of concurrent events is 128.
Event Detection Interval	Interval at which the system detects the stream for new messages. The unit of the interval can be Second or Minute.
Access Policy	Select the location where data is to be accessed: Access from the last location: For the first access, data is accessed from the most recently recorded location. For the subsequent access, data is accessed from the previously recoded location. Access from a new location: Data is accessed from the most recently recorded location each time.
Failure Policy	Select a policy to be performed after scheduling fails. Suspend Ignore the failure and proceed with the next event

Setting Up Scheduling for Nodes of a Job Using the Real-Time Processing Mode

Three scheduling types are available: Run once, Run periodically, and Event-based. The procedure is as follows:

Select a node. On the node development page, click the Scheduling Parameter Setup tab. On the displayed page, configure the parameters listed in Table 4.

**Table 4** Parameters for setting up node scheduling
Parameter	Description
Scheduling Type	Scheduling type of the job. Available options include: Run once: You need to manually run the job. Run periodically: The job runs automatically and periodically. Event-based: The job runs when certain external conditions are met.
Parameters displayed when Scheduling Type is Run periodically
From and to	The period during which a scheduling task takes effect.
Recurrence	The frequency at which the scheduling task is executed, which can be: Minutes Hours Every day Every week Every month For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table.
Cross-Cycle Dependency	Dependency between job instances Independent on the previous schedule cycle Self-dependent (The current job can continue to run only after the previous schedule cycle is successfully finished.)
Parameters displayed when Scheduling Type is Event-based
Event Type	Type of the event that triggers job running.
Connection Name	Before selecting a data connection, ensure that a Kafka data connection has been created in the Management Center.
Topic	Topic of the message to be sent to the Kafka.
Consumer Group	A scalable and fault-tolerant group of consumers in Kafka. Consumers in a group share the same ID. They collaborate with each other to consume all partitions of subscribed topics. A partition in a topic can be consumed by only one consumer. NOTE: A consumer group can contain multiple consumers. The group ID is a string that uniquely identifies a consumer group in a Kafka cluster. Each partition of each topic subscribed to by a consumer group can be consumed by only one consumer. Consumer groups do not affect each other. If you select KAFKA for Event Type, the consumer group ID is automatically displayed. You can also manually change the consumer group ID.
Concurrent Events	Number of jobs that can be concurrently processed. The maximum number of concurrent events is 10.
Event Detection Interval	Interval at which the system detects the stream for new messages. The unit of the interval can be Seconds or Minutes.
Failure Policy	Select a policy to be performed after scheduling fails. Suspend Ignore failure and proceed