Updated on 2024-04-15 GMT+08:00

Creating a Metric Alarm Rule

You can set threshold conditions in metric alarm rules for resource metrics. If a metric value meets a threshold condition, a threshold alarm will be reported. If there is no metric data, an insufficient data event will be reported.

Functions

  • You can set the consecutive periods, statistical period, and threshold condition. For details, see 5.c.
  • You can set whether to send a notification when an alarm is cleared. For details, see 5.c.

Creation Mode

Metric alarm rules can be created in three modes: Select by resource type, Select from all metrics, and Run Prometheus statement.

When creating metric alarm rules by resource type, you can set an alarm condition using two methods: Custom and Template. If you select the second method, first create an alarm template by referring to Creating an Alarm Template.

Precautions

If you need AOM to send email or SMS notifications when the metric alarm rule status (Exceeded, Normal, Insufficient, or Disabled) changes, set an alarm action rule according to Creating an Alarm Action Rule.

Creating Metric Alarm Rules by Resource Type

  1. On the menu bar, choose Monitoring Center.
  2. In the navigation pane, choose Alarm Management > Alarm Rules.
  3. On the Alarm Rules tab page, click Create Alarm Rule.
  4. Set basic information about the alarm rule by referring to Table 1.

    Table 1 Basic information

    Parameter

    Description

    Rule Name

    Name of a rule. Enter a maximum of 255 characters. The following special characters are not allowed: "$# %&'+;<=>?\

    Description

    Description of the rule. Enter up to 1000 characters.

  5. Set the detailed information about the alarm rule.

    1. Set Rule Type to Metric alarm rule.
    2. Set Configuration Mode to Select by resource type and specify Resource Type and Monitored Object. Table 2 describes the parameters.
      Table 2 Parameter description

      Parameter

      Description

      Resource Type

      Select a desired resource type from the drop-down list.

      • When you click the Application Metrics tab, you can select resources based on the following dimensions:
        • Host: Select resources by host, including host, host disk, host network, host file system, and host GPU.
        • Application: Select resources by application.
        • Component: Select resources by component.
        • Process: Select resources by process.
      • When you click the Cloud Service Metrics tab, you can select resources by cloud service.

      Monitored Object

      Click Select Monitored Object. All existing resources of the type you select will be displayed. Select target resources as required.

      If you enable Apply to All when selecting monitored objects, an alarm rule will be created for all resources of the type you select under an application or service. When this type of resources are added or modified, they will be automatically bound to the created alarm rule. When they are deleted, they will be automatically unbound from the alarm rule.

    3. Set an alarm condition. Customize an alarm condition or import an alarm condition from a template.
      • Custom

        Click Custom and set the statistical period, consecutive periods, and alarm condition. Table 3 describes the parameters.

        Table 3 Alarm condition parameters

        Category

        Parameter

        Description

        Alarm Condition

        Metric

        Metric to be monitored.

        Consecutive Periods

        When the metric value meets the alarm condition for a specified number of consecutive periods, a metric alarm will be generated.

        Statistical Period

        Metric data is aggregated based on the configured statistical period, which can be 1 minute, 5 minutes, 15 minutes, or 1 hour.

        Statistic

        Method used to measure metrics. Options: Avg, Min, Max, Sum, and Samples.

        Alarm Condition

        Trigger condition of a metric alarm. An alarm condition consists of two parts: operators (≥, ≤, >, and <) and threshold value. For example, if the trigger condition is set to > 85 and an actual metric value exceeds 85, a metric alarm will be generated.

        Alarm Severity

        Severity of a metric alarm. Options: Critical, Major, Minor, and Warning.

        -

        Check Interval

        Interval at which metric query and analysis results are checked.

        • Hourly: Query and analysis results are checked every hour.
        • Daily: Query and analysis results are checked at a fixed time every day.
        • Weekly: Query and analysis results are checked at a fixed time point on a specified day of a week.
        • Custom interval: The query and analysis results are checked at a fixed interval.
        • Cron: A cron expression is used to specify a time interval. Query and analysis results are checked at the specified interval.

          The time specified in the cron expression can be accurate to the minute and must be in the 24-hour notation. Example: 0/5 * * * *, which indicates that the check starts from 0th minute and is performed every 5 minutes.

        Advanced Settings

        Alarm Clearance

        An alarm will be cleared if the monitored object does not meet the trigger condition within the monitoring period. By default, metrics in only one period are monitored. You can set up to five monitoring periods.

        Action Taken for Insufficient Data

        Action to be taken when no metric data is generated or metric data is insufficient within the monitoring period. You can set this option based on your requirements.

        By default, metrics in only one period are monitored. You can set up to five monitoring periods.

        The system supports the following actions: changing the status to exceeded and sending an alarm, changing the status to insufficient data and sending an event, maintaining the previous status, and changing the status to normal and sending an alarm clearance notification.

      • Template

        Select Template and set related parameters. Ensure that you have created an alarm template. For details, see Creating an Alarm Template.

        Table 4 Alarm condition parameters

        Parameter

        Description

        Bind Template

        Specifies whether to bind an alarm profile.

        Alarm Template

        Select an alarm template. If the existing templates do not meet requirements, click Create Alarm Template to create one.

        Alarm Condition

        The system automatically imports the preset alarm condition in the template. Note that the condition cannot be modified.

        Check Interval

        The system automatically imports the check interval set in the template. Note that the check interval cannot be modified.

        Alarm Clearance

        The system automatically imports the alarm clearance setting in the template. Note that it cannot be modified.

        Action Taken for Insufficient Data

        The system automatically imports the action setting in the template. Note that it cannot be modified.

  6. Set an alarm notification policy.

    • Direct alarm reporting: An alarm is directly sent when the alarm condition is met.
      1. Specify whether to enable an alarm action rule. After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template. If the existing alarm action rules cannot meet your requirements, click Create Rule to create one. For details, see Creating an Alarm Action Rule.
      2. After an alarm action rule is selected, specify whether to enable alarm clearance notification. After alarm clearance notification is enabled, if the alarm clearance condition set in Advanced Settings > Alarm Clearance is met, alarm clearance notifications are sent based on the selected action rule.

  7. Click Confirm. Then, click Back to Alarm Rule List to view the created alarm rule.

    In the expanded list, if a metric value meets the configured alarm condition, a metric alarm is generated on the alarm page. To view it, choose Alarm Management > Alarm List in the navigation pane. If a host meets the preset notification policy, the system sends an alarm notification to the specified personnel by email or SMS.

Creating Metric Alarm Rules by Selecting Metrics from All Metrics

  1. On the menu bar, choose Monitoring Center.
  2. In the navigation pane, choose Alarm Management > Alarm Rules.
  3. On the Alarm Rules tab page, click Create Alarm Rule.
  4. Set basic information about the alarm rule by referring to Table 5.

    Table 5 Basic information

    Parameter

    Description

    Rule Name

    Name of a rule. Enter a maximum of 255 characters. The following special characters are not allowed: "$# %&'+;<=>?\

    Description

    Description of the rule. Enter up to 1000 characters.

  5. Set the detailed information about the alarm rule.

    1. Set Rule Type to Metric alarm rule.
    2. Set Configuration Mode to Select from all metrics.
      • When Select from all metrics is selected, enter keywords to search for metrics.
      • Scope: Metric monitoring scope. The scope is in the key-value pair format. Directly select an option from the drop-down list or use AND, OR, and NOT to specify scopes for metrics.
      • Group Condition: Aggregate metric data by the specified field and calculate the aggregation result. Options: Not grouped, avg by, max by, min by, and sum by. For example, avg by clusterName indicates that metrics are grouped by cluster name, and the average value of the grouped metrics is calculated and displayed in the graph.
    3. Select a target Prometheus instance from the drop-down list.
    4. Set parameters such as the metric, environment, and check interval. Table 6 describes the parameters.

      After an alarm condition is set, the monitored metric data is displayed in a line graph above the alarm condition. You can click Hide Graph, before a metric name, or the line icon before each metric data record to hide the metric data in the graph.

      Table 6 Alarm condition parameters

      Category

      Parameter

      Description

      -

      Add one by one

      Calculation is performed based on the preset alarm conditions one by one. An alarm is triggered when one of the conditions is met.

      For example, if three alarm conditions are set, the system performs calculation respectively. If any of the conditions is met, an alarm will be triggered.

      -

      Combined operations

      After calculation is performed based on the expression you set, an alarm is triggered when the condition is met.

      For example, if there is no metric showing the CPU core usage of a host, do as follows:

      • Set the metric of alarm condition "a" to aom_node_cpu_used_core and retain the default values for other parameters. This metric is used to count the number of CPU cores used by a measured object.
      • Set the metric of alarm condition "b" to aom_node_cpu_limit_core and retain the default values for other parameters. This metric is used to count the total number of CPU cores that have been applied for a measured object.
      • If the expression is set to "a/b", the CPU core usage of the host can be obtained.
      • Set the threshold condition to > 0.2.
      • Set Alarm Severity to Critical.

      A critical alarm will be generated when the CPU core usage of a host is greater than 0.2.

      Alarm Condition

      Metric

      Select the metric to be monitored.

      Scope

      Metric monitoring scope.

      The scope is in the key-value pair format. Directly select an option from the drop-down list or use AND, OR, and NOT to specify scopes for metrics.

      Grouping Condition

      Aggregate metric data by the specified field and calculate the aggregation result. Options: Not grouped, avg by, max by, min by, and sum by. For example, avg by clusterName indicates that metrics are grouped by cluster name, and the average value of the grouped metrics is calculated and displayed in the graph.

      Alarm Condition

      Condition for triggering a metric alarm. It consists of the grouping condition (not grouped), judgment condition (≥, ≤, >, and <), and threshold. For example, if the trigger condition is set to > 85 and an actual metric value exceeds 85, a metric alarm will be generated.

      Move the cursor to the graph area above the alarm condition. The ID, IP address, and unit of the current metric are displayed.

      Alarm Severity

      Severity of a metric alarm. Options: Critical, Major, Minor, and Warning.

      -

      Check Interval

      Interval at which metric query and analysis results are checked.

      • Hourly: Query and analysis results are checked every hour.
      • Daily: Query and analysis results are checked at a fixed time every day.
      • Weekly: Query and analysis results are checked at a fixed time point on a specified day of a week.
      • Custom interval: The query and analysis results are checked at a fixed interval.
      • Cron: A cron expression is used to specify a time interval. Query and analysis results are checked at the specified interval.

        The time specified in the cron expression can be accurate to the minute and must be in the 24-hour notation. Example: 0/5 * * * *, which indicates that the check starts from 0th minute and is performed every 5 minutes.

      -

      Statistical Period

      Metric data is aggregated based on the configured statistical period and statistical mode. If the threshold condition is met for a specified number of consecutive periods, a metric alarm is generated. By default, metrics in the last minute are collected.

      Advanced Settings

      Alarm Clearance

      An alarm will be cleared if the monitored object does not meet the trigger condition within the monitoring period. By default, metrics in only one period are monitored. You can set up to five monitoring periods.

      Action Taken for Insufficient Data

      Action to be taken when no metric data is generated or metric data is insufficient within the monitoring period. You can set this option based on your requirements.

      By default, metrics in only one period are monitored. You can set up to five monitoring periods.

      The system supports the following actions: changing the status to exceeded and sending an alarm, changing the status to insufficient data and sending an event, maintaining the previous status, and changing the status to normal and sending an alarm clearance notification.

  6. Set an alarm notification policy.

    • Direct alarm reporting: An alarm is directly sent when the alarm condition is met.
      1. Specify whether to enable an alarm action rule. After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template. If the existing alarm action rules cannot meet your requirements, click Create Rule to create one. For details, see Creating an Alarm Action Rule.
      2. After an alarm action rule is selected, specify whether to enable alarm clearance notification. After alarm clearance notification is enabled, if the alarm clearance condition set in Advanced Settings > Alarm Clearance is met, alarm clearance notifications are sent based on the selected action rule.

  7. Click Confirm. Then, click Back to Alarm Rule List to view the created alarm rule.

    In the expanded list, if a metric value meets the configured alarm condition, a metric alarm is generated on the alarm page. To view it, choose Alarm Management > Alarm List in the navigation pane. If a host meets the preset notification policy, the system sends an alarm notification to the specified personnel by email or SMS.

Creating Metric Alarm Rules by Running Prometheus Statements

  1. On the menu bar, choose Monitoring Center.
  2. In the navigation pane, choose Alarm Management > Alarm Rules.
  3. On the Alarm Rules tab page, click Create Alarm Rule.
  4. Set basic information about the alarm rule by referring to Table 5.

    Table 7 Basic information

    Parameter

    Description

    Rule Name

    Name of a rule. Enter a maximum of 255 characters. The following special characters are not allowed: "$# %&'+;<=>?\

    Description

    Description of the rule. Enter up to 1000 characters.

  5. Set the detailed information about the alarm rule.

    1. Set Rule Type to Metric alarm rule.
    2. Set Configuration Mode to Run Prometheus statement.
    3. Select a target Prometheus instance from the drop-down list.
    4. Enter a Prometheus statement. There are two modes: manual input and auto input.
      • Manual input: used when you know the metric name and IP address, and you are familiar with Prometheus statement formats. Click . Related metric graphs are displayed in the lower part of the page in real time.
      • Auto input: used when you do not know the metric information or are unfamiliar with the Prometheus format. The command can only be automatically filled when you switch from the Metric Browsing page.

        Specifically, choose Metric Browsing in the navigation pane. Select the Prometheus instance to be monitored from the drop-down list. On the Metric List tab page, click Metric type, All metrics, or Resource type and then select up to 12 metrics from the resource tree. Next, click above the metric list. The system automatically switches to the metric alarm rule creation page and autocompletes the Prometheus command.

      • You can click View Example to get more information. For details, see Prometheus Statements.
    5. Set an alarm condition. Set alarm condition parameters, such as consecutive periods, statistical period, and threshold condition. Table 8 describes the parameters.
      Table 8 Alarm condition parameters

      Category

      Parameter

      Description

      Alarm Condition

      Consecutive Periods

      When the metric value meets the alarm condition for a specified number of consecutive periods, a metric alarm will be generated.

      Statistical Period

      Metric data is aggregated based on the configured statistical period, which can be 1 minute, 5 minutes, 15 minutes, or 1 hour.

      Statistic

      Method used to measure metrics. Options: Avg, Min, Max, Sum, and Samples.

      Alarm Condition

      Trigger condition of a metric alarm. An alarm condition consists of two parts: operators (≥, ≤, >, and <) and threshold value. For example, if the trigger condition is set to > 85 and an actual metric value exceeds 85, a metric alarm will be generated.

      Alarm Severity

      Severity of a metric alarm. Options: Critical, Major, Minor, and Warning.

      -

      Check Interval

      Interval at which metric query and analysis results are checked.

      • Hourly: Query and analysis results are checked every hour.
      • Daily: Query and analysis results are checked at a fixed time every day.
      • Weekly: Query and analysis results are checked at a fixed time point on a specified day of a week.
      • Custom interval: The query and analysis results are checked at a fixed interval.
      • Cron: A cron expression is used to specify a time interval. Query and analysis results are checked at the specified interval.

        The time specified in the cron expression can be accurate to the minute and must be in the 24-hour notation. Example: 0/5 * * * *, which indicates that the check starts from 0th minute and is performed every 5 minutes.

      Advanced Settings

      Alarm Clearance

      An alarm will be cleared if the monitored object does not meet the trigger condition within the monitoring period. By default, metrics in only one period are monitored. You can set up to five monitoring periods.

      Action Taken for Insufficient Data

      Action to be taken when no metric data is generated or metric data is insufficient within the monitoring period. You can set this option based on your requirements.

      By default, metrics in only one period are monitored. You can set up to five monitoring periods.

      The system supports the following actions: changing the status to exceeded and sending an alarm, changing the status to insufficient data and sending an event, maintaining the previous status, and changing the status to normal and sending an alarm clearance notification.

  6. Set an alarm notification policy.

    • Direct alarm reporting: An alarm is directly sent when the alarm condition is met.
      1. Specify whether to enable an alarm action rule. After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template. If the existing alarm action rules cannot meet your requirements, click Create Rule to create one. For details, see Creating an Alarm Action Rule.
      2. After an alarm action rule is selected, specify whether to enable alarm clearance notification. After alarm clearance notification is enabled, if the alarm clearance condition set in Advanced Settings > Alarm Clearance is met, alarm clearance notifications are sent based on the selected action rule.

  7. Click Confirm. Then, click Back to Alarm Rule List to view the created alarm rule.

    In the expanded list, if a metric value meets the configured alarm condition, a metric alarm is generated on the alarm page. To view it, choose Alarm Management > Alarm List in the navigation pane. If a host meets the preset notification policy, the system sends an alarm notification to the specified personnel by email or SMS.