Managing Containers
This section describes how to use AOM to quickly manage containers on the Overview page, including container monitoring and alarm rule creation. The procedure is as follows:
- Monitoring Containers: AOM is compatible with Kubernetes and automatically collects and reports container information.
- Setting an Alarm Rule: Create metric alarm rules to ensure that notifications are sent when containers are abnormal.
- Setting an Alarm Action Rule: Configure alarm action rules, for example, containers automatically restart when they become abnormal.
The Overview option is disabled by default. If you need this option, enable it on the Menu Settings page. For details, see Menu Settings.
Monitoring Containers
- Log in to the AOM 2.0 console.
- In the navigation pane, choose Overview.
- On the displayed page, switch to By Container.
- In the Getting Started area, click Monitor Container. The Workload Monitoring page is displayed.
- In the upper right corner of the page, set filter criteria.
- Set a time range to view the workloads reported. There are two methods to set a time range:
Method 1: Use a predefined time label, such as Last hour, Last 6 hours, or Last day. Select one as required.
Method 2: Specify the start time and end time (max. 30 days).
- Set the interval for refreshing information. Click
and select a desired value from the drop-down list.
- Set a time range to view the workloads reported. There are two methods to set a time range:
- Click any workload tab to view information, such as workload name, status, cluster, and namespace.
- In the upper part of the workload list, filter workloads by cluster, namespace, or pod name.
- Click
in the upper right corner to obtain the latest workload information.
- Click
in the upper right corner and select or deselect the columns to display.
- Click the name of a workload to view its details.
- On the Pods tab page, view all pod conditions of the workload. Click a pod name to view the resource usage and health status of the pod's containers.
- On the Monitoring Views tab page, view the resource usage of the workload.
- On the Logs tab page, view the raw logs and real-time logs of the workload and analyze them as required.
- On the Alarms tab page, view the alarm details of the workload.
- On the Events tab page, view the event details of the workload.
Setting an Alarm Rule
Metric alarm rules can be created using the following modes: Select from all metrics, and PromQL.
The following uses Select from all metrics as an example.
- On the Overview page, switch to By Container.
- In the Getting Started area, click Set Alarm Rule. The Alarm Rules page is displayed.
- Click Create Alarm Rule.
- Set basic information about the alarm rule by referring to Table 1.
Table 1 Basic information Parameter
Description
Rule Name
Name of a rule. Enter a maximum of 256 characters and do not start or end with any special character. Only letters, digits, underscores (_), and hyphens (-) are allowed.
Enterprise Project
Enterprise project.
- If you have selected All for Enterprise Project on the global settings page, select one from the drop-down list here.
- If you have already selected an enterprise project on the global settings page, this option will be dimmed and cannot be changed.
Description
Description of the rule. Enter up to 1024 characters.
- Set the detailed information about the alarm rule.
- Set Rule Type to Metric alarm rule.
- Set Configuration Mode to Select from all metrics.
- Select a target Prometheus instance from the drop-down list.
- Set alarm rule details. Table 2 describes the parameters.
After the setting is complete, the monitored metric data is displayed in a line graph above the alarm condition. A maximum of 50 metric data records can be displayed. Click the line icon before each metric data record to hide the metric data in the graph. You can click Add Metric to add metrics and set the statistical period and detection rules for the metrics.
After moving the cursor to the metric data and the corresponding alarm condition, you can perform the following operations as required:
- Click
next to an alarm condition to hide the corresponding metric data record in the graph.
- Click
next to an alarm condition to convert the metric data and alarm condition into a Prometheus command.
- Click
next to an alarm condition to quickly copy the metric data and alarm condition and modify them as required.
- Click
next to an alarm condition to remove a metric data record from monitoring.
Figure 1 Setting alarm rule detailsTable 2 Alarm rule details Parameter
Description
Multiple Metrics
Calculation is performed based on the preset alarm conditions one by one. An alarm is triggered when one of the conditions is met.
For example, if three alarm conditions are set, the system performs calculation respectively. If any of the conditions is met, an alarm will be triggered.
Combined Operations
The system performs calculation based on the expression you set. If the condition is met, an alarm will be triggered.
For example, if there is no metric showing the CPU core usage of a host, do as follows:
- Set the metric of alarm condition "a" to aom_node_cpu_used_core and retain the default values for other parameters. This metric is used to count the number of CPU cores used by a measured object.
- Set the metric of alarm condition "b" to aom_node_cpu_limit_core and retain the default values for other parameters. This metric is used to count the total number of CPU cores that have been applied for a measured object.
- If the expression is set to "a/b", the CPU core usage of the host can be obtained.
- Set Rule to Max > 0.2.
- In the trigger condition, set Consecutive Periods to 3.
- Set Alarm Severity to Critical.
If the maximum CPU core usage of a host is greater than 0.2 for three consecutive periods, a critical alarm will be generated.
Metric
Metric to be monitored. When Select from all metrics is selected, enter keywords to search for metrics.
Click the Metric text box. In the resource tree on the right, you can also select a target metric by resource type.
Statistical Period
Metric data is aggregated based on the configured statistical period, which can be 15 seconds, 30 seconds, 1 minute, 5 minutes, 15 minutes, or 1 hour.
Condition
Metric monitoring scope. If this parameter is left blank, all resources are covered.
Each condition is in a key-value pair. You can select a dimension name from the drop-down list. The dimension value varies according to the matching mode.
- =: Select a dimension value from the drop-down list. For example, if Dimension Name is set to Host name and Dimension Value is set to 192.168.16.4, only host 192.168.16.4 will be monitored.
- !=: Select a dimension value from the drop-down list. For example, if Dimension Name is set to Host name and Dimension Value is set to 192.168.16.4, all hosts excluding host 192.168.16.4 will be monitored.
- =~: The dimension value is determined based on one or more regular expressions. Separate regular expressions by vertical bar (|). For example, if Dimension Name is set to Host name and Regular Expression is set to 192.*|172.*, only hosts whose names are 192.* and 172.* will be monitored.
- !~: The dimension value is determined based on one or more regular expressions. Separate regular expressions by vertical bar (|). For example, if Dimension Name is set to Host name and Regular Expression is set to 192.*|172.*, all hosts excluding hosts 192.* and 172.* will be monitored.
For details about how to enter a regular expression, see Regular Expression Examples.
You can also click
and select AND or OR to add more conditions for the metric.
Grouping Condition
Aggregate metric data by the specified field and calculate the aggregation result. Options: Not grouped, avg by, max by, min by, and sum by. For example, avg by clusterName indicates that metrics are grouped by cluster name, and the average value of the grouped metrics is calculated and displayed in the graph.
Rule
Detection rule of a metric alarm, which consists of the statistical mode (Avg, Min, Max, Sum, and Samples), determination criterion (≥, ≤, >, and <), and threshold value. For example, if the detection rule is set to Avg >10, a metric alarm will be generated if the average metric value is greater than 10.
Trigger Condition
When the metric value meets the alarm condition for a specified number of consecutive periods, a metric alarm will be generated. Range: 1 to 30.
For example, if Consecutive Periods is set to 2, a metric alarm will be triggered if the trigger condition is met for two consecutive periods.
Alarm Severity
Metric alarm severity. Options:
: critical alarm.
: major alarm.
: minor alarm.
: warning.
- Click
- Click Advanced Settings and set information such as Check Interval and Alarm Clearance. For details about the parameters, see Table 3.
Table 3 Advanced settings Parameter
Description
Check Interval
Interval at which metric query and analysis results are checked.
- Hourly: Query and analysis results are checked every hour.
- Daily: Query and analysis results are checked at a fixed time every day.
- Weekly: Query and analysis results are checked at a fixed time point on a specified day of a week.
- Custom interval: The query and analysis results are checked at a fixed interval.
NOTE:
You can set Check Interval to 15 seconds or 30 seconds to implement second-level monitoring. The timeliness of metric alarms depends on the metric reporting period, rule check interval, and notification send time.
For example, if the metric reporting period is 5 seconds, rule check interval is 30 seconds, and notification send time is 1 second, an alarm can be detected and an alarm notification can be sent within 36 seconds.
- Cron: A cron expression is used to specify a time interval. Query and analysis results are checked at the specified interval.
The time specified in the cron expression can be accurate to the minute and must be in the 24-hour notation. Example: 0/5 * * * *, which indicates that the check starts from 0th minute and is performed every 5 minutes.
Alarm Clearance
The alarm will be cleared when the alarm condition is not met for a specified number of consecutive periods. By default, metrics in only one period are monitored. You can set up to 30 consecutive monitoring periods.
For example, if Consecutive Periods is set to 2, the alarm will be cleared when the alarm condition is not met for two consecutive periods.
Action Taken for Insufficient Data
Action to be taken when no metric data is generated or metric data is insufficient within the monitoring period. You can set this option based on your requirements.
By default, metrics in only one period are monitored. You can set up to five consecutive monitoring periods.
The system supports the following actions: changing the status to Exceeded and sending an alarm, changing the status to Insufficient data and sending an event, maintaining Previous status, and changing the status to Normal and sending an alarm clearance notification.
Alarm Tag
Click
to add an alarm tag. Alarm identification attribute. It is used in alarm noise reduction scenarios. It is in the format of "key:value".
For details, see .
Alarm Annotation
Click
to add an alarm annotation. Alarm non-identification attribute. It is used in alarm notification and message template scenarios. It is in the format of "key:value".
For details, see .
- Set an alarm notification policy. For details, see Table 4.
Figure 2 Setting an alarm notification policy
Table 4 Parameters for setting an alarm notification policy Parameter
Description
Notify When
Set the scenario for sending alarm notifications.
- Alarm triggered: If the alarm trigger condition is met, the system sends an alarm notification to the specified personnel by email or SMS.
- Alarm cleared: If the alarm clearance condition is met, the system sends an alarm notification to the specified personnel by email or SMS.
- Click Confirm. Then click View Rule to view the created alarm rule.
In the expanded list, if a metric value meets the configured alarm condition, a metric alarm is generated on the alarm page. To view it, choose Alarm Management > Alarm List in the navigation pane. If a metric value meets the preset notification policy, the system sends an alarm notification to the specified personnel by email or SMS.
Figure 3 Created metric alarm rule
Setting an Alarm Action Rule
- Go to the Dashboard page and switch to By Container.
- In the Getting Started area, click Set Alarm Action Rule. The Alarm Action Rules page is displayed.
- On the Action Rules tab page, click Create.
- Set parameters such as Rule Name and Action Type by referring to Table 5.
Figure 4 Creating an alarm action rule
Table 5 Parameters for creating an alarm action rule Parameter
Description
Rule Name
Name of an action rule. Enter up to characters and do not start or end with an underscore (_) or hyphen (-). Only digits, letters, underscores, and hyphens are allowed.
Enterprise Project
Enterprise project.
- If you have selected All for Enterprise Project on the global settings page, select one from the drop-down list here.
- If you have already selected an enterprise project on the global settings page, this option will be dimmed and cannot be changed.
Description
Description of the action rule. Enter up to 1024 characters.
Action Type
Type of an alarm action rule.
Action
Type of action that is associated with the SMN topic and message template. Select your desired action from the drop-down list. Only Notification is supported.
Topic
SMN topic. Select your desired topic from the drop-down list.
If there is no topic you want to select, create one on the SMN console.
Message Template
Notification message template. Select your desired template from the drop-down list.
If no proper message template is available, click Create Template to create a message template.
- Click OK.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.