Updated on 2024-10-23 GMT+08:00

Creating a Metric Alarm Rule

You can set threshold conditions in metric alarm rules for resource metrics. If a metric value meets a threshold condition, a threshold alarm will be reported. If there is no metric data, an insufficient data event will be reported.

Function Introduction

  • You can set the statistical period, detection rules, and trigger conditions for alarm rules. For details, see 5.d.
  • You can configure alarm notifications. For details, see 7.
  • Two alarm notification modes are supported: direct alarm reporting and noise reduction. For details, see Setting an Alarm Notification Policy.

Creation Mode

You can create metric alarm rules in the following ways: Select from all metrics and PromQL.

Precautions

  • If you need AOM to send email or SMS notifications when the metric alarm rule status (Exceeded, Normal, Effective, or Disabled) changes, set an alarm action rule according to Creating an Alarm Action Rule.
  • Second-level monitoring is supported when you create metric alarm rules by selecting metrics from all metrics or using PromQL. The timeliness of metric alarms depends on the metric reporting period, rule check interval, and notification send time.

Creating Metric Alarm Rules by Selecting Metrics from All Metrics

  1. Log in to the AOM 2.0 console.
  2. In the navigation pane, choose Alarm Management > Alarm Rules.
  3. Click Create Alarm Rule.
  4. Set basic information about the alarm rule by referring to Table 1.

    Table 1 Basic information

    Parameter

    Description

    Rule Name

    Name of a rule. Enter a maximum of 256 characters and do not start or end with any special character. Only letters, digits, underscores (_), and hyphens (-) are allowed.

    Enterprise Project

    Enterprise project.

    • If you have selected All for Enterprise Project on the global settings page, select one from the drop-down list here.
    • If you have already selected an enterprise project on the global settings page, this option will be dimmed and cannot be changed.

    Description

    Description of the rule. Enter up to 1024 characters.

  5. Set the detailed information about the alarm rule.

    1. Set Rule Type to Metric alarm rule.
    2. Set Configuration Mode to Select from all metrics.
    3. Select a target Prometheus instance from the drop-down list.
    4. Set alarm rule details. Table 2 describes the parameters.

      After the setting is complete, the monitored metric data is displayed in a line graph above the alarm condition. A maximum of 50 metric data records can be displayed. Click the line icon before each metric data record to hide the metric data in the graph. You can click Add Metric to add metrics and set the statistical period and detection rules for the metrics.

      After moving the cursor to the metric data and the corresponding alarm condition, you can perform the following operations as required:

      • Click next to an alarm condition to hide the corresponding metric data record in the graph.
      • Click next to an alarm condition to convert the metric data and alarm condition into a Prometheus command.
      • Click next to an alarm condition to quickly copy the metric data and alarm condition and modify them as required.
      • Click next to an alarm condition to remove a metric data record from monitoring.
      Figure 1 Setting alarm rule details
      Table 2 Alarm rule details

      Parameter

      Description

      Multiple Metrics

      Calculation is performed based on the preset alarm conditions one by one. An alarm is triggered when one of the conditions is met.

      For example, if three alarm conditions are set, the system performs calculation respectively. If any of the conditions is met, an alarm will be triggered.

      Combined Operations

      The system performs calculation based on the expression you set. If the condition is met, an alarm will be triggered.

      For example, if there is no metric showing the CPU core usage of a host, do as follows:

      • Set the metric of alarm condition "a" to aom_node_cpu_used_core and retain the default values for other parameters. This metric is used to count the number of CPU cores used by a measured object.
      • Set the metric of alarm condition "b" to aom_node_cpu_limit_core and retain the default values for other parameters. This metric is used to count the total number of CPU cores that have been applied for a measured object.
      • If the expression is set to "a/b", the CPU core usage of the host can be obtained.
      • Set Rule to Max > 0.2.
      • In the trigger condition, set Consecutive Periods to 3.
      • Set Alarm Severity to Critical.

      If the maximum CPU core usage of a host is greater than 0.2 for three consecutive periods, a critical alarm will be generated.

      Metric

      Metric to be monitored. When Select from all metrics is selected, enter keywords to search for metrics.

      Click the Metric text box. In the resource tree on the right, you can also select a target metric by resource type.

      Statistical Period

      Metric data is aggregated based on the configured statistical period, which can be 15 seconds, 30 seconds, 1 minute, 5 minutes, 15 minutes, or 1 hour.

      Condition

      Metric monitoring scope. If this parameter is left blank, all resources are covered.

      Each condition is in a key-value pair. You can select a dimension name from the drop-down list. The dimension value varies according to the matching mode.

      • =: Select a dimension value from the drop-down list. For example, if Dimension Name is set to Host name and Dimension Value is set to 192.168.16.4, only host 192.168.16.4 will be monitored.
      • !=: Select a dimension value from the drop-down list. For example, if Dimension Name is set to Host name and Dimension Value is set to 192.168.16.4, all hosts excluding host 192.168.16.4 will be monitored.
      • =~: The dimension value is determined based on one or more regular expressions. Separate regular expressions by vertical bar (|). For example, if Dimension Name is set to Host name and Regular Expression is set to 192.*|172.*, only hosts whose names are 192.* and 172.* will be monitored.
      • !~: The dimension value is determined based on one or more regular expressions. Separate regular expressions by vertical bar (|). For example, if Dimension Name is set to Host name and Regular Expression is set to 192.*|172.*, all hosts excluding hosts 192.* and 172.* will be monitored.

      For details about how to enter a regular expression, see Regular Expression Examples.

      You can also click and select AND or OR to add more conditions for the metric.

      Grouping Condition

      Aggregate metric data by the specified field and calculate the aggregation result. Options: Not grouped, avg by, max by, min by, and sum by. For example, avg by clusterName indicates that metrics are grouped by cluster name, and the average value of the grouped metrics is calculated and displayed in the graph.

      Rule

      Detection rule of a metric alarm, which consists of the statistical mode (Avg, Min, Max, Sum, and Samples), determination criterion (, , >, and <), and threshold value. For example, if the detection rule is set to Avg >10, a metric alarm will be generated if the average metric value is greater than 10.

      Trigger Condition

      When the metric value meets the alarm condition for a specified number of consecutive periods, a metric alarm will be generated. Range: 1 to 30.

      For example, if Consecutive Periods is set to 2, a metric alarm will be triggered if the trigger condition is met for two consecutive periods.

      Alarm Severity

      Metric alarm severity. Options:

      • : critical alarm.
      • : major alarm.
      • : minor alarm.
      • : warning.

  6. Click Advanced Settings and set information such as Check Interval and Alarm Clearance. For details about the parameters, see Table 3.

    Table 3 Advanced settings

    Parameter

    Description

    Check Interval

    Interval at which metric query and analysis results are checked.

    • Hourly: Query and analysis results are checked every hour.
    • Daily: Query and analysis results are checked at a fixed time every day.
    • Weekly: Query and analysis results are checked at a fixed time point on a specified day of a week.
    • Custom interval: The query and analysis results are checked at a fixed interval.
      NOTE:

      You can set Check Interval to 15 seconds or 30 seconds to implement second-level monitoring. The timeliness of metric alarms depends on the metric reporting period, rule check interval, and notification send time.

      For example, if the metric reporting period is 5 seconds, rule check interval is 30 seconds, and notification send time is 1 second, an alarm can be detected and an alarm notification can be sent within 36 seconds.

    • Cron: A cron expression is used to specify a time interval. Query and analysis results are checked at the specified interval.

      The time specified in the cron expression can be accurate to the minute and must be in the 24-hour notation. Example: 0/5 * * * *, which indicates that the check starts from 0th minute and is performed every 5 minutes.

    Alarm Clearance

    The alarm will be cleared when the alarm condition is not met for a specified number of consecutive periods. By default, metrics in only one period are monitored. You can set up to 30 consecutive monitoring periods.

    For example, if Consecutive Periods is set to 2, the alarm will be cleared when the alarm condition is not met for two consecutive periods.

    Action Taken for Insufficient Data

    Action to be taken when no metric data is generated or metric data is insufficient within the monitoring period. You can set this option based on your requirements.

    By default, metrics in only one period are monitored. You can set up to five consecutive monitoring periods.

    The system supports the following actions: changing the status to Exceeded and sending an alarm, changing the status to Insufficient data and sending an event, maintaining Previous status, and changing the status to Normal and sending an alarm clearance notification.

    Alarm Tag

    Click to add an alarm tag. Alarm identification attribute. It is used in alarm noise reduction scenarios. It is in the format of "key:value".

    For details, see Alarm Tags and Annotations.

    NOTE:

    If tag policies related to AOM have already been set, add alarm tags based on these policies. If a tag does not comply with the policies, tag addition may fail. Contact your organization administrator to learn more about tag policies.

    Alarm Annotation

    Click to add an alarm annotation. Alarm non-identification attribute. It is used in alarm notification and message template scenarios. It is in the format of "key:value".

    For details, see Alarm Tags and Annotations.

  7. Set an alarm notification policy. For details, see Table 4.

    Figure 2 Setting an alarm notification policy
    Table 4 Parameters for setting an alarm notification policy

    Parameter

    Description

    Notify When

    Set the scenario for sending alarm notifications.

    • Alarm triggered: If the alarm trigger condition is met, the system sends an alarm notification to the specified personnel by email or SMS.
    • Alarm cleared: If the alarm clearance condition is met, the system sends an alarm notification to the specified personnel by email or SMS.

    Alarm Mode

    • Direct alarm reporting: An alarm is directly sent when the alarm condition is met. If you select this mode, set an interval for notification and specify whether to enable an action rule.

      Frequency: interval for sending alarm notifications. Select a desired value from the drop-down list.

      After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template. If the existing alarm action rules cannot meet your requirements, click Create Rule in the drop-down list to create one. For details, see Creating an Alarm Action Rule.

    • Alarm noise reduction: Alarms are sent only after being processed based on noise reduction rules, preventing alarm storms.

      If you select this mode, the silence rule is enabled by default. You can determine whether to enable Grouping Rule as required. After this function is enabled, select a grouping rule from the drop-down list. If existing grouping rules cannot meet your requirements, click Create Rule in the drop-down list to create one. For details, see Creating a Grouping Rule.

      NOTE:

      The alarm severity and tag configured in the selected grouping rule must match those configured in the alarm rule. Otherwise, the grouping rule does not take effect.

  8. Click Confirm. Then click View Rule to view the created alarm rule.

    In the expanded list, if a metric value meets the configured alarm condition, a metric alarm is generated on the alarm page. To view it, choose Alarm Management > Alarm List in the navigation pane. If a metric value meets the preset notification policy, the system sends an alarm notification to the specified personnel by email or SMS.

    Figure 3 Created metric alarm rule

Creating Metric Alarm Rules by Running Prometheus Statements

  1. Log in to the AOM 2.0 console.
  2. In the navigation pane, choose Alarm Management > Alarm Rules.
  3. Click Create.
  4. Set basic information about the alarm rule by referring to Table 5.

    Table 5 Basic information

    Parameter

    Description

    Rule Name

    Name of a rule. Enter a maximum of 256 characters and do not start or end with any special character. Only letters, digits, underscores (_), and hyphens (-) are allowed.

    Enterprise Project

    Enterprise project.

    • If you have selected All for Enterprise Project on the global settings page, select one from the drop-down list here.
    • If you have already selected an enterprise project on the global settings page, this option will be dimmed and cannot be changed.

    Description

    Description of the rule. Enter up to 1024 characters.

  5. Set the detailed information about the alarm rule.

    1. Set Rule Type to Metric alarm rule.
    2. Set Configuration Mode to PromQL.
    3. Select a target Prometheus instance from the drop-down list.
    4. Set alarm rule details. Table 6 describes the parameters.

      After the setting is complete, the monitored metric data is displayed in a line graph above the alarm condition. A maximum of 50 metric data records can be displayed. Click the line icon before each metric data record to hide the metric data in the graph.

      Figure 4 Setting alarm rule details
      Table 6 Alarm rule details

      Parameter

      Description

      Default Rule

      Detection rule generated based on Prometheus statements. The system provides two input modes: Custom and CCEFromProm. After the input is complete, click Query. The corresponding graph will be displayed in the lower part of the page in real time.

      • Custom: If you have known the metric name and IP address and are familiar with the Prometheus statement format, select Custom from the drop-down list and manually enter a Prometheus command.
      • CCEFromProm: used when you do not know the metric information or are unfamiliar with the Prometheus format. Select CCEFromProm from the drop-down list and then select a desired template from the CCE templates. The system then automatically fills in the Prometheus command based on the selected template.

        You can click to view examples. For details, see Prometheus Statements.

      Alarm Severity

      Metric alarm severity. Options:

      • : critical alarm.
      • : major alarm.
      • : minor alarm.
      • : warning.

      Dimensions

      Metric monitoring dimension, which is automatically generated based on the Prometheus statement you set.

      Duration

      A metric alarm will be triggered when the alarm condition is met for the specified duration. Options: Immediate, 15 seconds, 30 seconds, 1 minute, 2 minutes, 5 minutes, and 10 minutes. For example, if Duration is set to 2 minutes, a metric alarm is triggered when the default rule condition is met for 2 minutes.

  6. Click Advanced Settings and set information such as Check Interval and Alarm Clearance. For details about the parameters, see Table 7.

    Table 7 Advanced settings

    Parameter

    Description

    Check Interval

    Interval at which metric query and analysis results are checked.

    • XX hours: Check the query and analysis results every XX hours.
    • XX minutes: Check the query and analysis results every XX minutes.
    • XX seconds: Check the query and analysis results every XX seconds.
      NOTE:

      You can set Check Interval to 15 seconds or 30 seconds to implement second-level monitoring. The timeliness of metric alarms depends on the metric reporting period, rule check interval, and notification send time.

      For example, if the metric reporting period is 15 seconds, rule check interval is 15 seconds, and notification send time is 3 seconds, an alarm can be detected and an alarm notification can be sent within 33 seconds.

    Alarm Tag

    Alarm identification attribute. It is used in alarm noise reduction scenarios. It is in the format of "key:value".

    It is automatically generated based on the Prometheus statement you set. You can modify it as required. To add more alarm tags, click . For details, see Alarm Tags and Annotations.

    NOTE:

    If tag policies related to AOM have already been set, add alarm tags based on these policies. If a tag does not comply with the policies, tag addition may fail. Contact your organization administrator to learn more about tag policies.

    Alarm Annotation

    Click to add an alarm annotation. Alarm non-identification attribute. It is used in alarm notification and message template scenarios. It is in the format of "key:value". For details, see Alarm Tags and Annotations.

  7. Set an alarm notification policy. For details, see Table 8.

    Figure 5 Setting an alarm notification policy
    Table 8 Parameters for setting an alarm notification policy

    Parameter

    Description

    Notify When

    Set the scenario for sending alarm notifications.

    • Alarm triggered: If the alarm trigger condition is met, the system sends an alarm notification to the specified personnel by email or SMS.
    • Alarm cleared: If the alarm clearance condition is met, the system sends an alarm notification to the specified personnel by email or SMS.

    Alarm Mode

    • Direct alarm reporting: An alarm is directly sent when the alarm condition is met. If you select this mode, set an interval for notification and specify whether to enable an action rule.

      Frequency: interval for sending alarm notifications. Select a desired value from the drop-down list.

      After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template. If the existing alarm action rules cannot meet your requirements, click Create Rule in the drop-down list to create one. For details, see Creating an Alarm Action Rule.

    • Alarm noise reduction: Alarms are sent only after being processed based on noise reduction rules, preventing alarm storms.

      If you select this mode, the silence rule is enabled by default. You can determine whether to enable Grouping Rule as required. After this function is enabled, select a grouping rule from the drop-down list. If existing grouping rules cannot meet your requirements, click Create Rule in the drop-down list to create one. For details, see Creating a Grouping Rule.

      NOTE:

      The alarm severity and tag configured in the selected grouping rule must match those configured in the alarm rule. Otherwise, the grouping rule does not take effect.

    Notification Template

    Template for sending alarm notifications. It is automatically generated based on the Prometheus statement you set.

    NOTE:

    You can use variables (that is, dimensions) in a notification template. The format is "${Dimension}".

  8. Click Confirm. Then click View Rule to view the created alarm rule.

    In the expanded list, if a metric value meets the configured alarm condition, a metric alarm is generated on the alarm page. To view it, choose Alarm Management > Alarm List in the navigation pane. If a metric value meets the preset notification policy, the system sends an alarm notification to the specified personnel by email or SMS.

    Figure 6 Created metric alarm rule