Updated on 2023-11-17 GMT+08:00

Threshold Alarms (New)

This function is available in CN North-Beijing1, CN North-Beijing4, CN East-Shanghai1, CN East-Shanghai2, CN South-Guangzhou, CN Southwest-Guiyang1, CN-Hong Kong, CN South-Shenzhen, CN South-Guangzhou-InvitationOnly, CN North-Ulanqab1, AP-Bangkok, and AP-Singapore.

Alarm is a basic function of AOM and plays an important role in routine O&M. AOM can interconnect with dozens of VM and component metrics, and notify customers of system problems by SMS message or email.

Supported Metrics

AOM allows you to set threshold rules for metrics of various resources such as hosts and components. You can view the supported metric types on the threshold rule creation page.

For more information about metrics, see Metric Overview.

Creation Methods

You can customize threshold rules or use templates to create threshold rules. Only one rule is generated at a time. All resources are monitored using the same rule.

To use the second method to create a static threshold rule, ensure that a static threshold template has been created.

You are advised to customize threshold rules.

Customizing a Threshold Rule

  1. Log in to the AOM console. In the navigation pane, choose Alarm Center > Alarm Rules. Then, click Add Alarm in the upper right corner.
  2. Customize a threshold rule.

    1. Set basic information such as the rule name and description.
    2. Set rule details.
      1. Set Rule Type to Threshold alarm.
      2. Select monitored objects. Use either of the following methods:
        • Select resource objects: Click Select Resource Object, add objects by dimension or resource, and click Confirm.
          • A threshold rule can monitor up to 100 pieces of metric data.
          • If you enable Apply to All () when selecting objects to monitor, an alarm rule will be created for all metrics of the type you select under an application or service. For example, if you select CCE/Host/Host/CPU Usage and enable Apply to All, an alarm rule will be created for all hosts in CCE.
          • Click Edit resource objects to modify the selected resource object.
        • Command input: Both manual and auto inputs are supported.
          • Manual input: used when you know the metric name and IP address, and you are familiar with the Prometheus format.
            For example, to query the CPU usage of the host, run command avg(label_replace(avg_over_time(aom_node_cpu_usage{hostID="81010a40-1682-41c1-9645-f0588ff9c0cf",nodeIP="192.168.1.210",clusterId = '00000000-0000-0000-0000-00000000'}[59999ms]), "__name__","aom_node_cpu_usage","","")) by(__name__,hostID,nodeIP).

            For details about Prometheus commands, move the cursor to next to the search box and click Learn more.

          • Auto input: used when you do not know the metric information or are unfamiliar with the Prometheus format. The command can only be automatically filled when you switch from the Metric Monitoring page.

            Specifically, choose Monitoring > Metric Monitoring in the navigation pane. Then, click Add Metric and select Dimension or Resource for Add By. Select up to 12 metrics to monitor. Next, click in the Operation column. The system automatically switches to the threshold rule creation page and fills the Prometheus command for your metric.

      3. Set an alarm condition. Click Custom and set information such as Statistical Period, Consecutive Periods, and Threshold Criterion. Table 1 describes the parameters.
        Table 1 Alarm condition parameters

        Category

        Parameter

        Description

        Trigger Condition

        Statistical Period

        Interval at which metric data is collected. By default, only one period is measured. A maximum of five periods can be measured.

        Consecutive Periods

        When the metric value meets the threshold condition for a specified number of consecutive periods, a threshold-crossing alarm will be generated.

        Statistic

        Method used to measure metrics. Options: Avg., Min., Max., Sum, and Samples.

        Threshold Condition

        Trigger condition of a threshold alarm. A threshold condition consists of two parts: operators (≥, ≤, >, and <) and threshold value. For example, after Threshold Criterion is set to > 85, if the actual metric value exceeds 85, a threshold alarm is generated.

        Move the cursor to the graph area above the alarm condition. The ID, IP address, and unit of the current metric are displayed.

        Alarm Severity

        Severity of a threshold alarm. Options: Critical, Major, Minor, and Warning.

        Advanced Settings

        Alarm Clearance

        An alarm will be cleared if the monitored object does not meet the trigger condition within the monitoring period. By default, metrics in only one period are monitored. You can set up to five monitoring periods.

        Action Taken for Insufficient Data

        Action to be taken when no metric data is generated or metric data is insufficient within the monitoring period. You can set this option based on your requirements.

        By default, metrics in only one period are monitored. You can set up to five monitoring periods.

        Options: Alarm, Insufficient data, Keep previous status, and Normal.

        Figure 1 Setting an alarm condition
      4. Set alarm tags and annotations to group alarms. They can be associated with alarm noise reduction policies for sending notifications.

        Click Add Tag or Add Annotation.

    3. Set an alarm notification policy. There are two alarm notification modes.
      • Direct Alarm Reporting: An alarm is directly sent when the alarm condition is met.
        1. Specify whether to enable an alarm action rule. After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template. If existing alarm action rules cannot meet your requirements, click Create Rule to create one. For details, see Creating an Alarm Action Rule.
        2. After an alarm action rule is selected, specify whether to enable alarm clearance notification. After alarm clearance notification is enabled, if the alarm clearance condition set in Advanced Settings > Alarm Clearance is met, alarm clearance notifications are sent based on the selected action rule.
        Figure 2 Selecting the direct alarm reporting mode
      • Alarm Noise Reduction: Alarms are sent only after being processed based on alarm action rules, preventing alarm storms.

        Select a grouping rule from the drop-down list. If existing grouping rules cannot meet your requirements, click Create Rule to create one. For details, see Grouping Rules.

        Figure 3 Selecting the alarm noise reduction mode

  3. Click Create Now to complete the creation. As shown in the following figure, a threshold rule is created. Click to monitor the same metric of multiple resources.

    In the expanded list, if the metric data of a host meets the preset alarm condition, a threshold alarm is generated on the alarm page. To view the alarm, go to the AOM console and choose Alarm Center > Alarm List in the navigation pane. If a host meets the preset notification policy, the system sends an alarm notification to the specified personnel by WeCom, email, or SMS.

    Figure 4 Creating threshold rules