Updated on 2025-12-08 GMT+08:00

Configuring Custom Alarms on AOM

CCE interworks with AOM to report alarms and events. By setting alarm rules on AOM, you can check whether resources in clusters are normal in a timely manner.

Process

  1. Creating a Topic on SMN
  2. Creating an Alarm Notification Rule
  3. Adding an Alarm Rule
    1. Event alarms: Generate alarms based on the events reported by clusters to AOM. For details about the events and configurations, see Adding an Event Alarm.
    2. Metric alarms: Generate alarms based on the thresholds of monitoring metrics, such as resource utilization of servers and components. For details about the metric thresholds and configurations, see Adding a Metric Alarm.

Creating a Topic on SMN

Simple Message Notification (SMN) pushes messages to subscribers through emails, SMS messages, and HTTP/HTTPS requests.

A topic is used to publish messages and subscribe to notifications. It serves as a message transmission channel between publishers and subscribers.

You need to create a topic and add a subscription to it.

After subscribing to a topic, confirm the subscription in the email or SMS message for the notification to take effect.

Creating an Alarm Notification Rule

AOM allows you to create custom alarm notification rules. You can create an alarm notification rule to associate an SMN topic with a message template. You can also create custom notification content based on a message template.

Adding an Event Alarm

The following uses NodeNotReady as an example to describe how to add an event alarm. You can add other alarms by referring to Table 1.

Table 1 Event-based alarms

Event Name

Source

Description

Solution

NodeNotReady

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node.

Rebooted

CCE

An alarm is triggered immediately when a node is restarted.

Log in to the cluster to check the status of the node for which the alarm is generated, check whether the node can be started properly, and locate the cause of the restart.

KUBELETIsDown

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. Then, restart kubelet.

DOCKERIsDown

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. Then, restart Docker.

KUBEPROXYIsDown

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node.

KernelOops

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node.

ConntrackFull

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node.

NodeCreateFailed

CCE

An alarm is triggered immediately upon a node creation failure.

Rectify the failure and create the node again.

ScaleUpTimedOut

CCE

An alarm is triggered immediately upon node scale-out timeout.

Rectify the failure and try scale-out again.

ScaleDownFailed

CCE

An alarm is triggered immediately upon node scale-in timeout.

Rectify the failure and try scale-in again.

BackOffPullImage

CCE

Image pull retry failed.

Log in to the cluster, locate the failure cause, and deploy the service workload again.

  1. Log in to the AOM console.
  2. In the navigation pane, choose Alarm Management > Alarm Rules. Then, click Create Alarm Rule.
  3. Enter basic information as prompted and configure other parameters as follows:

    • Rule Type: Select Event alarm rule.
    • Event Type: Select System.
    • Event Source: Select CCE.
    • Monitored Object: Filter monitored objects by notification type, event name, alarm severity, custom attribute, namespace, and cluster name.

      In this example, filter monitored objects by event name, select NodeNotReady, and set Trigger Mode to Immediate Trigger.

    • Alarm Mode: Select Direct alarm reporting.
    • Notification Rule: Select the action rule created in Creating an Alarm Notification Rule.

    Configure other parameters as required.

    In this example, the alarm settings are as follows:

    If a node in the cluster becomes abnormal, CCE reports the NodeNotReady event to AOM. AOM immediately notifies you through SMN based on the action rule.

  4. Click Confirm.

    A successfully created alarm rule will be displayed in the rule list.

Adding a Metric Alarm

The following uses a PromQL statement as an example to describe how to add a metric alarm.

  1. Log in to the AOM console.
  2. In the navigation pane, choose Alarm Management > Alarm Rules. Then, click Create Alarm Rule.
  3. Configure parameters as follows:

    • Rule Type: Select Metric alarm rule.
    • Configuration Mode: Select PromQL. You can enter native PromQL statements or use CCE templates.
    • Prometheus Instance: Select the AOM instance whose metrics are reported by Cloud Native Cluster Monitoring in the cluster.
    • Default Rule:
      • Custom: Enter a PromQL statement to configure the alarm rule. For example:
        kube_persistentvolume_status_phase{phase=~"Failed|Pending",cluster="${cluster_id}"} > 0

        ${cluster_id} indicates the cluster name. If a PV in the cluster is in the Failed or Pending state, an alarm will be generated.

      • CCEFromProm: Select an alarm template provided by CCE.
    • Alarm Mode: Select Direct alarm reporting.
    • Notification Rule: Select the action rule created in Creating an Alarm Notification Rule.

    Configure other parameters as required.

  4. Click Confirm.

    A successfully created alarm rule will be displayed in the rule list.