Monitoring a Job

After a job is created, you can view the job details through the following operations:

Viewing Job Details

This section describes how to view job details. After you create and run a job, you can view job details, including SQL statements and parameter settings. For a user-defined job, you can only view its parameter settings.

  1. In the navigation tree on the left pane of the CS management console, choose Job Management to switch to the Job Management page.
  2. In the Name column, click the job name to switch to the Job Details page.

    On the Job Details page, you can view SQL statements, total cost for the job, and Parameter List.
    Table 1 Parameter description

    Parameter

    Description

    Type

    Job type, for example, Flink streaming SQL job.

    ID

    Job ID.

    Status

    Status of a job.

    Running Mode

    If you create a job in a shared cluster, this parameter is Shared.

    If you create a job in a user-defined cluster, this parameter is Exclusively.

    Cluster

    If you create a job in a shared cluster, this parameter is Cluster Shared.

    If you create a job in a user-defined cluster, the specific cluster name is displayed.

    SPUs

    Number of SPUs for a job.

    Parallelism

    Number of tasks where CS jobs can simultaneously run.

    Enable Checkpoint

    Select Enable Checkpoint to save the intermediate job running results to OBS, thereby preventing data loss in the event of exceptions.

    Checkpoint Interval (s)

    This parameter is valid only when Enable Checkpoint is set to true.

    Interval between storing intermediate job running results to OBS.

    Checkpoint Mode

    This parameter is valid only when Enable Checkpoint is set to true.

    Checkpoint mode. Values include:

    • AtLeastOnce: indicates that events are processed at least once.
    • ExactlyOnce: indicates that events are processed only once.

    Save Job Log

    Select Save Job Log to save job run logs to OBS so that you can locate faults by using run logs in the event of faults.

    OBS Bucket

    This parameter is valid when Enable Checkpoint is true or Save Job Log is true.

    Name of the OBS bucket where data is dumped.

    Topic Name

    SMN topic name. If an exception occurs during job running, CS notifies users of the exception over SMN.

    Auto Restart upon Exception

    If you enable this function, CS automatically restarts and restores abnormal jobs upon job exceptions.

    Idle State Retention Time

    Defines for how long the state of a key is retained without being updated before it is removed in GroupBy or Window.

    Created

    Time when a job is created.

    Start Time

    Start time of a job.

    Enterprise Project

    Name of the enterprise project to which a job belongs.

    Total Billing Time

    Total running duration of a job for charging.

Checking the Dashboard

You can view details about job data input and output through the dashboard.

  1. In the navigation tree on the left pane of the CS management console, choose Job Management to switch to the Job Management page.
  2. In the Name column on the Job Management page, click the desired job name. On the displayed page, click Job Monitoring.

    The following table describes monitoring metrics related to Spark jobs.

    Table 2 Monitoring metrics related to Spark jobs

    Metric

    Description

    InputSize (records/sec)

    Provides the number of input records for a Spark job.

    ProcessingTime (ms)

    Provides the processing time distribution chart of all mini-batch tasks.

    SchedulingDelay (ms)

    Provides the scheduling delay distribution chart of all mini-batch tasks.

    TotalDelay (ms)

    Provides the total scheduling delay of all mini-batch tasks.

    • Click to refresh all the charts.
    • Click a chart and scroll the mouse wheel to zoom in or out.
    • You can only view monitoring information about running jobs.

    The following table describes monitoring metrics related to Flink jobs.

    Table 3 Monitoring metrics related to Flink jobs

    Metric

    Description

    Data Input Rate

    Provides the data input rate of a Flink job. Unit: Data records/s

    Total Input Records

    Provides the total number of input data records in a Flink job. Unit: Data records

    Total Input Bytes

    Provides the total input bytes of a Flink job. Unit: Byte

    Data Output Rate

    Provides the data output rate of a Flink job. Unit: Data records/s

    Total Output Records

    Provides the total number of output data records in a Flink job. Unit: Data records

    Total Output Bytes

    Provides the total output bytes of a Flink job. Unit: Byte

    CPU Load (%)

    Provides the CPU usage.

    Memory Usage (%)

    Provides the heap memory usage of a job.

    • Click Real-Time Refresh to refresh the running jobs in real time. The charts are updated every 10 seconds.
    • Click . In the displayed Add Chart dialog box, specify the parameter as required.
    • Click in the upper right corner of a chart to zoom in the chart.
    • Click to delete a metric.

Viewing the Job Execution Plan

You can view the execution plan to understand the operator stream information about the running job.

Execution plans of Spark jobs cannot be viewed.

  1. In the navigation tree on the left pane of the CS management console, choose Job Management to switch to the Job Management page.
  2. In the Name column on the Job Management page, click the desired job name. On the displayed page, click Execution Plan.

    • Scroll the mouse wheel or click to zoom in or out.
    • The stream diagram displays the operator stream information about the running job in real time.

Viewing the Task List of a Job

You can view details about each task running on a job, including the task start time, number of received and transmitted bytes, and running duration.

The task list of the Spark job cannot be viewed.

  1. In the navigation tree on the left pane of the CS management console, choose Job Management to switch to the Job Management page.
  2. In the Name column on the Job Management page, click the desired job name. On the displayed page, click Task List.

    1. View the operator task list.
      Table 4 Parameter description

      Parameter

      Description

      Name

      Name of an operator.

      Duration

      Running duration of an operator.

      Parallelism

      Number of parallel tasks in an operator.

      Task

      Operator tasks are categorized into the following:

      • The digit in red indicates the number of failed tasks.
      • The digit in light gray indicates the number of canceled tasks.
      • The digit in yellow indicates the number of tasks that are being canceled.
      • The digit in green indicates the number of finished tasks.
      • The digit in blue indicates the number of running tasks.
      • The digit in sky blue indicates the number of tasks that are being deployed.
      • The digit in dark gray indicates the number of tasks in a queue.

      Status

      Status of an operator task.

      Back Pressure Status

      Working load status of an operator. Available options are as follows:

      • OK: indicates that the operator is in normal working load.
      • LOW: indicates that the operator is in slightly high working load.
      • HIGH: indicates that the operator is in high working load.

      Delay

      Duration from the time when source data starts being processed to the time when data reaches the current operator. The unit is millisecond.

      Sent Records

      Records of an operator sending data.

      Sent Bytes

      Number of bytes sent by an operator.

      Received Bytes

      Number of bytes received by an operator.

      Received Records

      Records of an operator receiving data.

      Start Time

      Time when an operator starts running.

      End Time

      Time when an operator stops running.

    2. Click to view the task list.
      Table 5 Parameter description

      Parameter

      Description

      Start Time

      Time when a task starts running.

      End Time

      Time when a task stops running.

      Duration

      Task running duration.

      Received Bytes

      Number of bytes received by a task.

      Received Records

      Records received by a task.

      Sent Bytes

      Number of bytes sent by a task.

      Sent Records

      Records sent by a task.

      Attempts

      Number of retry attempts after a task is suspended.

      Host

      Host IP address of the operator.

Viewing Job Audit Logs

You can view the job operation records in audit logs, such as job creation, submission, running, and stop.

  1. In the navigation tree on the left pane of the CS management console, choose Job Management to switch to the Job Management page.
  2. In the Name column on the Job Management page, click the desired job name to switch to the Job Details page.
  3. Click Audit Log to view audit logs of the job.

    Figure 1 Viewing job audit logs

    A maximum of 50 logs can be displayed. For more audit logs, query them in CTS. For details about how to view audit logs in CTS, see section "Querying Real-Time Traces" in the Cloud Trace Service Quick Start.

    If no information is displayed on the Audit Log page, you need to enable CTS.

    1. Click Enable to switch to the CTS Authorization page.
    2. Click OK.

    You can also log in to the CTS management console to enable CTS. For details, see Enabling CTS.

    Table 6 Parameters related to audit logs

    Parameter

    Parameter description

    Event Name

    Name of an event.

    Resource Name

    Name of a running job.

    Resource ID

    ID of a running job.

    Type

    Job operation type.

    Level

    Event level. Available options include the following:

    • incident
    • warning
    • normal

    Operator

    Account used to run a job.

    Generated

    Time when an event occurs.

    Source IP Address

    IP address of the operator.

    Operation Result

    Operation result.

Viewing Job Running Logs

You can view the run logs to locate the faults occurring during job running.

  1. In the navigation tree on the left pane of the CS management console, choose Job Management to switch to the Job Management page.
  2. In the Name column on the Job Management page, click the desired job name. On the displayed page, click Running Logs.

    On the displayed page, you can view information of Job Manager and Task Manager for running jobs.

    Information about Job Manager and Task Manager is updated every minute. By default, only the run logs generated within the last 1 minute are displayed. You can click Log history to view more logs.

    If you select an OBS bucket for saving job logs during the job configuration, you can switch to the OBS bucket and download log files to view more historical logs.

    If the job is not running, information on the Task Manager page cannot be viewed.

Viewing Job Tags

You can view, add, modify, and delete job tags.

  1. In the navigation tree on the left pane of the CS management console, choose Job Management to switch to the Job Management page.
  2. In the row where the job whose tag you want to view is located, click the job name in the Name column to switch to the Job Details page.
  3. Click Tags to display the tag information about the current job.

    For more information about job tags, see Managing Job Tags.