Configuring Custom Alarms on AOM
CCE interworks with AOM to report alarms and events. By setting alarm rules on AOM, you can check whether resources in clusters are normal in a timely manner.
Process
- Creating a Topic on SMN
- Creating an Alarm Notification Rule
- Adding an Alarm Rule
- Event alarms: Generate alarms based on the events reported by clusters to AOM. For details about the events and configurations, see Adding an Event Alarm.
- Metric alarms: Generate alarms based on the thresholds of monitoring metrics, such as resource utilization of servers and components. For details about the metric thresholds and configurations, see Adding a Metric Alarm.
Creating a Topic on SMN
Simple Message Notification (SMN) pushes messages to subscribers through emails, SMS messages, and HTTP/HTTPS requests.
A topic is used to publish messages and subscribe to notifications. It serves as a message transmission channel between publishers and subscribers.
You need to create a topic and add a subscription to it.
After subscribing to a topic, confirm the subscription in the email or SMS message for the notification to take effect.
Creating an Alarm Notification Rule
AOM allows you to create custom alarm notification rules. You can create an alarm notification rule to associate an SMN topic with a message template. You can also create custom notification content based on a message template.
Adding an Event Alarm
The following uses NodeNotReady as an example to describe how to add an event alarm. You can add other alarms by referring to Table 1.
|
Event Name |
Source |
Description |
Solution |
|---|---|---|---|
|
NodeNotReady |
CCE |
An alarm is triggered immediately when a node is abnormal. |
Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. |
|
Rebooted |
CCE |
An alarm is triggered immediately when a node is restarted. |
Log in to the cluster to check the status of the node for which the alarm is generated, check whether the node can be started properly, and locate the cause of the restart. |
|
KUBELETIsDown |
CCE |
An alarm is triggered immediately when a node is abnormal. |
Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. Then, restart kubelet. |
|
DOCKERIsDown |
CCE |
An alarm is triggered immediately when a node is abnormal. |
Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. Then, restart Docker. |
|
KUBEPROXYIsDown |
CCE |
An alarm is triggered immediately when a node is abnormal. |
Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. |
|
KernelOops |
CCE |
An alarm is triggered immediately when a node is abnormal. |
Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. |
|
ConntrackFull |
CCE |
An alarm is triggered immediately when a node is abnormal. |
Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. |
|
NodeCreateFailed |
CCE |
An alarm is triggered immediately upon a node creation failure. |
Rectify the failure and create the node again. |
|
ScaleUpTimedOut |
CCE |
An alarm is triggered immediately upon node scale-out timeout. |
Rectify the failure and try scale-out again. |
|
ScaleDownFailed |
CCE |
An alarm is triggered immediately upon node scale-in timeout. |
Rectify the failure and try scale-in again. |
|
BackOffPullImage |
CCE |
Image pull retry failed. |
Log in to the cluster, locate the failure cause, and deploy the service workload again. |
- Log in to the AOM console.
- In the navigation pane, choose Alarm Management > Alarm Rules. Then, click Create Alarm Rule.
- Enter basic information as prompted and configure other parameters as follows:
- Rule Type: Select Event alarm rule.
- Event Type: Select System.
- Event Source: Select CCE.
- Monitored Object: Filter monitored objects by notification type, event name, alarm severity, custom attribute, namespace, and cluster name.
In this example, filter monitored objects by event name, select NodeNotReady, and set Trigger Mode to Immediate Trigger.
- Alarm Mode: Select Direct alarm reporting.
- Notification Rule: Select the action rule created in Creating an Alarm Notification Rule.
Configure other parameters as required.
In this example, the alarm settings are as follows:
If a node in the cluster becomes abnormal, CCE reports the NodeNotReady event to AOM. AOM immediately notifies you through SMN based on the action rule.
- Click Confirm.
A successfully created alarm rule will be displayed in the rule list.
Adding a Metric Alarm
The following uses a PromQL statement as an example to describe how to add a metric alarm.
- Log in to the AOM console.
- In the navigation pane, choose Alarm Management > Alarm Rules. Then, click Create Alarm Rule.
- Configure parameters as follows:
- Rule Type: Select Metric alarm rule.
- Configuration Mode: Select PromQL. You can enter native PromQL statements or use CCE templates.
- Prometheus Instance: Select the AOM instance whose metrics are reported by Cloud Native Cluster Monitoring in the cluster.
- Default Rule:
- Custom: Enter a PromQL statement to configure the alarm rule. For example:
kube_persistentvolume_status_phase{phase=~"Failed|Pending",cluster="${cluster_id}"} > 0${cluster_id} indicates the cluster name. If a PV in the cluster is in the Failed or Pending state, an alarm will be generated.
- CCEFromProm: Select an alarm template provided by CCE.
- Custom: Enter a PromQL statement to configure the alarm rule. For example:
- Alarm Mode: Select Direct alarm reporting.
- Notification Rule: Select the action rule created in Creating an Alarm Notification Rule.
Configure other parameters as required.
- Click Confirm.
A successfully created alarm rule will be displayed in the rule list.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot