Preventing ELB Alarm Storms Using AOM Alarm Grouping Rules
This section describes how to set alarm noise reduction. Before sending an alarm notification, AOM processes alarms based on noise reduction rules to prevent alarm storms.
Scenario
When analyzing applications, resources, and businesses, e-commerce O&M personnel find that the number of alarms is too large and there are too many identical alarms. They cannot detect faults based on the alarms or monitor applications comprehensively.
Solution
The following shows how to use grouping rules to clear alarm storms when monitoring metrics at the ELB business layer.
- Step 1: Create a Grouping Rule: Filter alarm subsets and then group them based on different conditions. Alarms in the same group are aggregated to trigger one notification.
- Step 2: Create a Metric Alarm Rule (Configuration Mode Set to Select from all metrics): Set an alarm rule and associate it with the grouping rule to monitor resources (such as hosts and components) in real time.
Prerequisite
Step 1: Create a Grouping Rule
When a critical or major alarm is generated, the apm notification rule is triggered, and alarms are grouped by alarm source. To create a grouping rule, do as follows:
- Log in to the AOM 2.0 console.
- In the navigation pane, choose Alarm Center > Alarm Noise Reduction.
- On the Grouping Rules tab page, click Create and set the rule name and grouping condition.
Figure 1 Creating a grouping rule
Table 1 Grouping rule parameters Parameter
Description
Example Value
Rule Name
Name of a grouping rule.
Enter up to 100 characters and do not start or end with an underscore (_). Only letters, digits, and underscores are allowed.
rule
Enterprise Project
Enterprise project name.
- If Enterprise Project is set to All on the global settings page, select an enterprise project from the drop-down list here.
- If you have already selected an enterprise project on the global settings page, this option will be grayed and cannot be changed.
default
Description
Description of a grouping rule. Enter up to 1,024 characters. In this example, leave this parameter blank.
-
Grouping Condition
Conditions set to filter alarms. After alarms are filtered out, you can set alarm notification rules for them.
- Alarm Severity: severity of a metric or event alarm. Options: Critical, Major, Minor, and Warning.
- Alarm Source: name of the service that triggers the alarm or event. Options: include AOM, LTS, and CCE.
- Alarm Severity + Equals to + Critical & Major
- Alarm Source + Equals to + AOM
Notification Rule
You can associate an alarm notification rule with an SMN topic and a message template. If the log, or resource or metric data meets the alarm condition, the system sends an alarm notification based on the associated SMN topic and message template.
apm
Combine Notifications
Combines grouped alarms based on specified fields. Alarms in the same group are aggregated for sending one notification. In this example, select By alarm source + severity.
By alarm source + severity: Alarms triggered by the same alarm source and of the same severity are combined into one group for sending notifications.
By alarm source + severity
Initial Wait Time
Interval for sending an alarm notification after alarms are combined for the first time. It is recommended that the time be set to seconds to prevent alarm storms.
15s
Batch Processing Interval
Waiting time for sending an alarm notification after the combined alarm data changes. The change here refers to a new alarm or an alarm status change.
60s
Repeat Interval
Waiting time for sending an alarm notification after the combined alarm data becomes duplicate. Duplication means that no new alarm is generated and no alarm status is changed while other attributes (such as titles and content) are changed.
1 hour
- Click Confirm.
Step 2: Create a Metric Alarm Rule (Configuration Mode Set to Select from all metrics)
You can set threshold conditions in metric alarm rules for resource metrics. If a metric value meets the threshold condition, a threshold alarm will be generated. If no metric data is reported, an insufficient data event will be generated.
The following describes how to create an alarm rule for monitoring all metrics at the ELB business layer.
- In the navigation pane, choose Alarm Center > Alarm Rules.
- On the Prometheus Monitoring tab page, click Create Alarm Rule.
- Set basic information about the alarm rule by referring to Table 2.
Table 2 Basic information Parameter
Description
Example Value
Original Rule Name
Original name of the alarm rule.
Enter a maximum of 256 characters and do not start or end with any special character. Only letters, digits, underscores (_), and hyphens (-) are allowed.
monitor
Rule Name
Name of a rule. Max.: 256 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed. Do not start or end with a hyphen or underscore. In this example, leave this parameter blank.
NOTE:- If you set Rule Name, it will be displayed preferentially.
- After an alarm rule is created, you can change Rule Name but cannot change Original Rule Name. When you change Rule Name and then move the cursor over it, both Original Rule Name and Rule Name can be viewed.
-
Enterprise Project
Select the required enterprise project. The default value is default.
default
Description
Description of the rule. Enter up to 1,024 characters. In this example, leave this parameter blank.
-
- Set the detailed information about the alarm rule.
- Set Rule Type to Metric alarm rule and Configuration Mode to Select from all metrics.
- Select Prometheus_AOM_Default (default) for Prometheus Instance.
- Set alarm rule details. Table 3 describes the parameters.
After the setting is complete, the monitored metric data is displayed in a line graph above the alarm conditions. You can click Add Metric to add more metrics and set the statistical period and detection rules for them.
Table 3 Alarm rule details Parameter
Description
Example Value
Multiple Metrics
Calculation is performed based on the preset alarm conditions one by one. An alarm is triggered when one of the conditions is met.
Multiple Metrics
Metric
Metric to be monitored. Click the Metric text box. In the resource tree on the right, select a target metric by resource type.
aom_process_cpu_usage
Statistical Period
Interval at which metric data is collected.
1 minute
Conditions
Metric monitoring scope. If this parameter is left blank, all resources are covered. In this example, leave this parameter blank.
-
Grouping Condition
Aggregate metric data by the specified field and calculate the aggregation result.
Not grouped
Rule
Detection rule of a metric alarm, which consists of the statistical mode (Avg, Min, Max, Sum, and Samples), determination criterion (≥, ≤, >, and <), and threshold value.
Avg > 1
Trigger Condition
When the metric value meets the alarm condition for a specified number of consecutive periods, a metric alarm will be generated.
3
Alarm Severity
Severity of a metric alarm.
- Click Advanced Settings and set information such as Check Interval and Alarm Clearance. For details about the parameters, see Table 4.
Table 4 Advanced settings Parameter
Description
Example Value
Check Interval
Interval at which metric query and analysis results are checked.
Custom interval: 1 minute
Alarm Clearance
The alarm will be cleared when the alarm condition is not met for a specified number of consecutive periods.
1
Action Taken for Insufficient Data
Action to be taken if there is no or insufficient metric data within the monitoring period. Enable this option if needed.
Enabled: If the data is insufficient for 1 period, the status will change to Insufficient data and an alarm will be sent.
Tags
Click
to add tags for alarm rules. They will be synchronized to TMS. They can be used to filter alarm rules and group alarms to reduce noise. They can also be referenced as "${event.metadata.tag key}" in message templates. In this example, leave this parameter blank.
-
Annotations
Click
to add attributes (key-value pairs) for alarm rules. Annotations will not be synchronized to TMS, but can be used to group alarms to reduce noise and referenced as "${event.metadata.annotation key}" in message templates. In this example, leave this parameter blank.
-
- Set an alarm notification policy. For details, see Table 5.
Figure 2 Selecting the alarm noise reduction mode
Table 5 Alarm notification policy parameters Parameter
Description
Example Value
Notify When
Set the scenario for sending alarm notifications. By default, Alarm triggered and Alarm cleared are selected.
- Alarm triggered: If the alarm trigger condition is met, the system sends an alarm notification to the specified personnel by email or SMS.
- Alarm cleared: If the alarm clearance condition is met, the system sends an alarm notification to the specified personnel by email or SMS.
Alarm triggered and Alarm cleared
Alarm Mode
Alarm mode. Select Alarm noise reduction.
Alarm noise reduction: Alarms are sent only after being processed based on noise reduction rules, preventing alarm storms.
Alarm noise reduction
Grouping Rule
Filter alarm subsets and then group them based on the grouping conditions. Alarms in the same group are aggregated to trigger one notification.
rule
- Click Confirm. Then click View Rule to view the created rule.
If a metric value meets the configured alarm condition, a metric alarm will be generated. To view the alarm, choose Alarm Center > Alarm List in the navigation pane. The generated AOM critical and major alarms will be aggregated based on the rule set in Step 1: Create a Grouping Rule for notification.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot