Updated on 2022-09-23 GMT+08:00

Setting Up Scheduling for a Job

This section describes how to set up scheduling for an orchestrated job.

Prerequisites

  • You have developed a job by following the instructions in Developing a Job.
  • You have locked the job. Otherwise, you must click Lock so that you can develop the job. A job you create or import is locked by you by default. For details, see the lock function.

Constraints

  • Set an appropriate value for this parameter. A maximum of five instances can be concurrently executed in a job. If the start time of a job instance is later than the configured job execution time, the job instances in the subsequent batch will be queued. As a result, the job execution costs a longer time than expected. For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table.
  • If you use DataArts Studio DataArts Factory to schedule a CDM migration job and configure a scheduled task for the job in DataArts Migration, both configurations take effect. To ensure unified service logic and avoid scheduling conflicts, enable job scheduling in DataArts Factory and do not configure a scheduled task for the job in DataArts Migration.

Setting Up Scheduling for a Job Using the Batch Processing Mode

Three scheduling types are available: Run once, Run periodically, and Event-based. The procedure is as follows:

Click the Scheduling Setup tab on the right of the canvas to expand the configuration page and configure the scheduling parameters listed in Table 1.

Table 1 Job scheduling parameters

Parameter

Description

Scheduling Type

Scheduling type of the job. Available options include:

  • Run once: You need to manually execute the job.
  • Run periodically: The job is executed periodically. For details about the parameters, see Table 2.
  • Event-based: The job will be executed when certain external conditions are met. For details about the parameters, see Table 3.

Dry run

If you select this option, the job will not be executed, and a success message will be returned.

Table 2 Parameters for jobs that are executed periodically

Parameter

Description

From and to

The period during which a scheduling task takes effect.

Recurrence

The frequency at which the scheduling task is executed, which can be:

Set an appropriate value for this parameter. A maximum of five instances can be concurrently executed in a job. If the start time of a job instance is later than the configured job execution time, the job instances in the subsequent batch will be queued. As a result, the job execution costs a longer time than expected. For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table.

  • Minutes: The job starts at the top of the hour. The interval is accurate to minute. After the scheduling ends at the end time of the current day, the scheduling automatically starts on the next day.
  • Hours: The job starts at a specified time point. The interval is accurate to hour. After the scheduling ends at the end time of the current day, the scheduling automatically starts on the next day.
  • Every day: The job starts at a specified time on a day. The scheduling period is one day.
  • Every week: You can select a specified time point of one or more days in a week.
  • Every month: You can select a specified time point of one or more days in a month.

Dependency job

If you select a dependency job that is executed periodically, the current job will be executed only when an instance of the dependency job is executed within a certain period of time. You can only search for jobs by name. For details about the conditions of dependency jobs and how a job runs after its dependency jobs are set, see Job Dependency.

If you select multiple dependency jobs, you can execute the current job only after all dependency job instances are executed within a specified time range (see How a Job Runs After a Dependency Job Is Set for It for details.).

The constraints are as follows:

  • The recurrence of job A cannot be shorter than that of job B. For example, if both job A and job B are scheduled by minute or hour and the interval of job A is shorter than that of job B, then job B cannot be set as the dependency job of job A. If job A is scheduled by minute and job B is scheduled by hour, job B cannot be set as the dependency job of job A.
  • The recurrence of neither job A nor job B can be week. For example, if the recurrence of job A or job B is week, job B cannot be set as the dependency job of job A.
  • A job whose recurrence is month can depend only on a job whose recurrence is day. For example, if the recurrence of job A is month, job B can be set as the dependency job of job A only if job B's recurrence is day.

Policy for Current job If Dependency job Fails

Policy for processing the current job when one or more instances of its dependency job fail to be executed in its period.

  • Suspend

    Suspends the current job. The suspended job will block the execution of subsequent jobs. You can force the dependency job to be executed successfully.

  • Continue

    Continues to execute the current job.

  • Terminate

    Stops executing the current job. Its status becomes Canceled.

For example, the recurrence of the current job is 1 hour and that of its dependency jobs is 5 minutes.
  • If the value of this parameter is set to Terminate, the current job will be terminated as long as one of the 12 instances of its dependency job fails.
  • If the value of this parameter is set to Continue, the current job will be executed after the 12 instances of its dependency job are executed.
    NOTE:

    You can set this parameter for multiple jobs in a batch. For details, see Configuring a Default Item.

Run After Dependency job Ends

If a job depends on other jobs, the job is executed only after its dependency job instances are executed within a specified time range (see How a Job Runs After a Dependency Job Is Set for It for details). If the dependency job instances are not successfully executed, the current job is in waiting state.

If you select this option, the system checks whether all job instances in the previous cycle have been executed before executing the current job.

Cross-Cycle Dependency

Dependency between job instances

  • Independent on the previous schedule cycle: You can set Concurrency to set the number of job instances that are concurrently executed. If you set it to 1, a batch is executed only after the previous batch is executed (the execution is successful, cancelled, or failed).
  • Self-dependent (The current job can continue to run only after the previous schedule cycle is successfully finished.)
Table 3 Parameters for event-based jobs

Parameter

Description

Event Type

Type of the event that triggers job running

  • KAFKA

Parameters for KAFKA event-triggered jobs

Connection Name

Before selecting a data connection, ensure that a Kafka data connection has been created in the Management Center.

Topic

Topic of the message to be sent to the Kafka.

Concurrent Events

Number of jobs that can be concurrently processed. The maximum number of concurrent events is 128.

Event Detection Interval

Interval at which the system detects the stream for new messages. The unit of the interval can be Second or Minute.

Access Policy

Select the location where data is to be accessed:

  • Access from the last location: For the first access, data is accessed from the most recently recorded location. For the subsequent access, data is accessed from the previously recoded location.
  • Access from a new location: Data is accessed from the most recently recorded location each time.

Failure Policy

Select a policy to be performed after scheduling fails.

  • Suspend
  • Ignore the failure and proceed with the next event

Setting Up Scheduling for Nodes of a Job Using the Real-Time Processing Mode

Three scheduling types are available: Run once, Run periodically, and Event-based. The procedure is as follows:

Select a node. On the node development page, click the Scheduling Parameter Setup tab. On the displayed page, configure the parameters listed in Table 4.

Table 4 Parameters for setting up node scheduling

Parameter

Description

Scheduling Type

Scheduling type of the job. Available options include:

  • Run once: You need to manually run the job.
  • Run periodically: The job runs automatically and periodically.
  • Event-based: The job runs when certain external conditions are met.

Parameters displayed when Scheduling Type is Run periodically

From and to

The period during which a scheduling task takes effect.

Recurrence

The frequency at which the scheduling task is executed, which can be:

  • Minutes
  • Hours
  • Every day
  • Every week
  • Every month

For CDM and ETL jobs, the recurrence must be at least 5 minutes. In addition, the recurrence should be adjusted based on the data volume of the job table and the update frequency of the source table.

Cross-Cycle Dependency

Dependency between job instances

  • Independent on the previous schedule cycle
  • Self-dependent (The current job can continue to run only after the previous schedule cycle is successfully finished.)

Parameters displayed when Scheduling Type is Event-based

Event Type

Type of the event that triggers job running.

Connection Name

Before selecting a data connection, ensure that a Kafka data connection has been created in the Management Center.

Topic

Topic of the message to be sent to the Kafka.

Consumer Group

A scalable and fault-tolerant group of consumers in Kafka.

Consumers in a group share the same ID. They collaborate with each other to consume all partitions of subscribed topics. A partition in a topic can be consumed by only one consumer.

NOTE:
  1. A consumer group can contain multiple consumers.
  2. The group ID is a string that uniquely identifies a consumer group in a Kafka cluster.
  3. Each partition of each topic subscribed to by a consumer group can be consumed by only one consumer. Consumer groups do not affect each other.

If you select KAFKA for Event Type, the consumer group ID is automatically displayed. You can also manually change the consumer group ID.

Concurrent Events

Number of jobs that can be concurrently processed. The maximum number of concurrent events is 10.

Event Detection Interval

Interval at which the system detects the stream for new messages. The unit of the interval can be Seconds or Minutes.

Failure Policy

Select a policy to be performed after scheduling fails.

  • Suspend
  • Ignore failure and proceed