Updated on 2024-04-29 GMT+08:00

Overview

Baseline O&M allows you to configure baseline tasks to monitor task statuses and resource usage. By configuring O&M baselines, you can ensure that important data is generated within the expected time in complex dependency scenarios. Baseline O&M effectively reduces configuration costs, avoids invalid alarms, and automatically monitors all important tasks.

Application scenarios:

  • Managing task priorities

    When the number of tasks keeps increasing but resources are limited, you can add important tasks to the baseline and set a higher priority for the baseline so that resources are preferentially allocated to important tasks.

  • Estimating the task completion time

    The running of tasks is affected by resources and their upstream tasks. You can add a task to the baseline, and the system will calculate the estimated completion time of the task.

  • Assuring on-time task completion

    You can add a task to a baseline and set a promised completion time. If the system predicts that the task cannot be completed before the promised time, or an upstream task is faulty or slows down, an alarm will be sent. You can handle the issue in a timely manner based on the alarm information to ensure that the task can be completed before the promised time.

Concepts

  • Baseline: After you add an important task to the baseline and set a promised completion time, the system calculates the estimated completion time of the task based on the task running status. If the system determines that the task may not be completed before the promised time, the system generates an alarm.
  • Promised completion time: indicates the latest time when a task should be successfully executed for data applications. If you want to reserve some time for O&M personnel to handle exceptions, you can set a time left before promise breakdown. The system uses the promised completion time minus the time left before promise breakdown as the warning time for triggering an alarm.
  • Time left before promise breakdown: A baseline warning is triggered at the time calculated by the promised completion time minus this time.
  • Warning time: It equals the promised completion time minus the time left before promise breakdown.
  • Estimated running duration: estimated running duration of the current task calculated based on the running duration of historical tasks
  • Latest start time to keep the promise: promised completion time minus estimated task running duration
  • Latest start time to trigger an alarm: warning time minus estimated task running duration

  • Baseline task: a task added to a baseline
  • Baseline instance: The system uses this instance to calculate the estimated completion time of each task. The status of the baseline instance can be secure, warning, or promise broken.

    • Secure: estimated completion time < warning time
    • Warning: warning time < estimated completion time < promised completion time
    • Promise broken: estimated completion time > promised completion time
  • Key path: the path that takes the longest time to run among multiple paths that affect the baseline task
  • Event: An event is generated when an error occurs in the baseline task or its upstream tasks, or when a task on the critical path becomes slow. Events affect the on-time completion of baseline tasks.

Monitoring Scope

Key tasks and all the upstream tasks on which the key tasks depend

Functions

After important tasks are added to a baseline, the system assures resources for the tasks based on the baseline priority, determines the monitoring scope based on the upstream and downstream dependencies of the baseline tasks, and triggers baseline alarms or event-based alarms based on the statuses of the monitored tasks. Baseline O&M provides the following functions:

  • Alarms for failures of key tasks
  • Alarms for delay of key tasks
  • Key path analysis
  • Preferential scheduling of key tasks
  • Alarms for key tasks
  • Immediate alarms for configuration errors
  • Full-link version comparison for key jobs

Alarm Mechanism

A baseline alarm is an alarm notification for a baseline that is enabled and whose alarm function is enabled. You can configure the time left before promise breakdown and the promised completion time based on the estimated completion time of the baseline. The system calculates the estimated latest completion time of a monitored task based on the historical running status of the task and monitors the task based on the actual running status of the baseline task. If the system predicts that the baseline task cannot be completed before the baseline warning time (baseline promised completion time – time left before promise breakdown), the system sends a baseline alarm to the alarm recipients defined for the baseline.

Alarm Types

  • Baseline warning

    First task in the monitored baseline link that is not completed before the warning time

  • Baseline promise breakdown
    The baseline promise breaks down when the following conditions are met:
    1. No promise breakdown occurs in the direct or indirect upstream of the task node.
    2. The task is not completed by the promised completion time.
  • Further promise breakdown
    This alarm is triggered when the following conditions are met:
    1. An alarm has been triggered for the task.
    2. The task running time is longer than estimated. To be specific:
  • Assured task not completed by warning time

    This alarm is generated when some assured tasks are not completed by the baseline warning time (promised completion time – time left before promise breakdown). Only one alarm is generated for an assured task.

  • Assured task not completed by promised completion time

    This alarm is generated when some assured tasks are not completed by the baseline promised completion time. Only one alarm is generated for an assured task.

  • Task failure

    This alarm is generated when any monitored task fails or stops being scheduled due to incorrect configurations.