Updated on 2022-09-23 GMT+08:00

Monitoring a Batch Job

In the batch processing mode, data is processed periodically in batches based on the job-level scheduling plan, which is used in scenarios with low real-time requirements. This type of job is a pipeline that consists of one or more nodes and is scheduled as a whole. It cannot run for an unlimited period of time, that is, it must end after running for a certain period of time.

You can choose Monitor Job and click the Batch Job Monitoring tab to view the scheduling status, frequency, and start time of a batch job, and perform the operations listed in Table 1.

Figure 1 Monitoring a Batch Job
Table 1 Operations supported by batch job monitoring

No.

Operation

Description

1

Searching for a job based on the job name or owner

-

2

Filtering jobs by whether notifications have been configured, scheduling status, job label, or next plan time

-

3

Perform operations on jobs in a batch

Select multiple jobs and perform operations on them.

4

Viewing job instance status

Click in front of the job name. The Last Instance page is displayed. You can view information about the last instance of the job.

5

Viewing node information of the job

Click a job name. On the displayed page, click the job node and view its associated jobs/scripts and monitoring information.

6

Job scheduling operations

In the Operation column of a job, you can run, pause, recover, stop, and configure scheduling. For details, see Batch Job Monitoring: Scheduling a Job.

7

Configuring notifications

In the Operation column of a job, choose More > Set Notification. In the displayed dialog box, configure notification parameters. Table 1 describes the notification parameters.

8

Monitoring instances

In the Operation column of a job, choose More > Monitor Instance to view the running records of all instances of the job.

9

PatchData

In the Operation column of a job, choose More > PatchData. For details, see Batch Job Monitoring: PatchData.

10

Adding a job label

In the Operation column of a job, choose More > Add Job Label. For details, see Batch Job Monitoring: Adding a Job Label.

Batch Job Monitoring: Scheduling a Job

After developing a job, you can manage job scheduling tasks on the Monitor Job page. Specific operations include to run, pause, restore, or stop scheduling.

Figure 2 Scheduling a job
  1. Log in to the DataArts Studio console. Locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Factory.
  2. In the left navigation pane of DataArts Factory, choose Monitoring > Monitor Job.
  3. Click the Batch Job Monitoring tab.
  4. In the Operation column of the job, click Submit, Pause, Restore, or Stop.
If a dependent job has been configured for a batch job, you can select either Start Current Job Only or Start Current and Depended Jobs when submitting the batch job. For details about how to configure dependent jobs, see Setting Up Scheduling for a Job Using the Batch Processing Mode.
Figure 3 Starting a job

Batch Job Monitoring: PatchData

A job executes a scheduling task to generate a series of instances in a certain period of time. This series of instances are called PatchData. PatchData can be used to fix the job instances that have data errors in the historical records or to build job records for debugging programs.

Only the periodically scheduled jobs support PatchData. For details about the execution records of PatchData, see Monitoring PatchData.

Do not modify the job configuration when PatchData is being performed. Otherwise, job instances generated during PatchData will be affected.

  1. Log in to the DataArts Studio console. Locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Factory.
  2. In the left navigation pane of DataArts Factory, choose Monitoring > Monitor Job.
  3. Click the Batch Job Monitoring tab.
  4. In the Operation column of the job, choose More > Configure PatchData.
  5. Configure PatchData parameters based on Table 2.
    Figure 4 PatchData parameters
    Table 2 Parameters

    Parameter

    Description

    PatchData Name

    Name of the automatically generated PatchData task. The value can be modified.

    Job Name

    Name of the job that requires PatchData.

    Date

    Period of time when PatchData is required.

    NOTE:

    PatchData can be configured for a job multiple times. However, avoid configuring PatchData multiple times on the same date to prevent data duplication or disorder.

    Parallel Instances

    Number of instances to be executed at the same time. A maximum of five instances can be executed at the same time.

    NOTE:

    Set this parameter based on the site requirements. For example, if a CDM job instance is used, data cannot be supplemented at the same time. The value of this parameter can only be set to 1.

    Downstream Job Requiring PatchData

    Select the downstream jobs (jobs that depend on the current job) that require PatchData. You can select multiple jobs.

  6. Click OK. The system starts to perform PatchData and the PatchData Monitoring page is displayed.

Batch Job Monitoring: Adding a Job Label

Labels can be added to jobs to facilitate job instance filtering.

  1. Log in to the DataArts Studio console. Locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Factory.
  2. In the left navigation pane of DataArts Factory, choose Monitoring > Monitor Job.
  3. Click the Batch Job Monitoring tab.
  4. In the Operation column of the job, choose More > Add Job Label.
  5. In the Add Job Label dialog box displayed, set the job label parameters.
    Figure 5 Parameters for adding a job label
  6. Click OK.