Configuring Custom Alarms on CCE

If the default alarm rules cannot meet your requirements, you can create alarm rules on CCE. Based on the alarm rules, you can check whether resources in clusters are normal in a timely manner.

Adding Metric Alarms

To create Prometheus metric threshold-crossing alarm rules and metric alarm rules, you need to enable Monitoring Center. For details, see Enabling Cluster Monitoring.
Some metric templates are created based on the problems reported by CCE Node Problem Detector (CCE Node Problem Detector). For details about these metrics, see Table 1. To use related alarm rules, ensure that CCE Node Problem Detector has been installed and is running normally.

Log in to the CCE console and click the cluster name to access the cluster console.
In the navigation pane, choose Alarm Center. Then, choose Alarm Rules > Custom Alarm Rules, and click Create Alarm Rule.

Configure the alarm rule parameters.

Rule Type: Select Metric alarm.
Alarm Template: If you select No template, you need to configure the parameters in Rule Details. You can also set this parameter to Use template to quickly define a PromQL-based alarm rule or modify an existing template.

Rule Details: Configure the parameters listed in the following table.

Parameter	Description	Example Value
Rule Name	Enter the name of the alarm rule.	CoreDNS memory usage higher than 80%
(Optional) Description	Describe the alarm rule.	Check whether the memory usage of CoreDNS is higher than 80%.
Alarm Rule (PromQL)	Enter a Prometheus query statement. For details about how to compile Prometheus query statements, see Query Examples.	The following is an example statement for generating an alarm when the maximum memory usage of CoreDNS is higher than 80%: (sum(container_memory_working_set_bytes{image!="", container!="POD",namespace="kube-system",container="coredns"}) BY (cluster_name, node,container, pod , namespace, cluster) / sum(container_spec_memory_limit_bytes{namespace="kube-system", container="coredns"} > 0) BY (cluster_name, node, container, pod , namespace, cluster) * 100) > 80
Severity	Select Critical, Major, Minor, or Warning.	Critical
Duration	Select an alarm duration from the drop-down list. The default value is 1 minute.	1 minute
Alarm Content	Define the content in the alarm notification. Variables in Prometheus can be obtained in the form of ${variable}.	Example: Cluster: ${cluster_name}, Namespace: ${namespace}, Pod: ${pod}, Container: ${container} memory usage is higher than 80%. The current value is ${value} %.
Contact Group	Select an existing contact group. You can also click Create Contact Group to create a contact group. For details about the parameters, see Configuring Alarm Notification Recipients.	CCEGroup

In the preceding example, an alarm rule named CoreDNS memory usage higher than 80% is set for CoreDNS in the kube-system namespace, and its severity is Critical. When the maximum memory usage is higher than 80% for 1 minute, a notification is sent to all alarm contacts in the CCEGroup contact group by SMS message or email. The notification contains the cluster name, namespace, pod name, container name, and current memory usage.

(Optional) Advanced Settings
- Alarm Tag: An attribute for identifying and grouping alarms to reduce noise. In the message template, the tag value is referenced as $event.metadate. A maximum of 10 alarm tags can be added.
- Alarm Annotation: An attribute that is not used for alarm identification. In the message template, the annotation value is referenced as $event.annotations. A maximum of 10 alarm annotations can be added.

Click OK. Then, go to the Custom Alarm Rules page to check whether the rule is successfully created.

Adding Event Alarms

To create event-triggered alarm rules, you need to enable Logging and Kubernetes event collection. For details, see Collecting Container Logs Using Cloud Native Logging.
Some metric templates are created based on the problems reported by CCE Node Problem Detector (CCE Node Problem Detector). For details about these metrics, see Table 1. To use related alarm rules, ensure that CCE Node Problem Detector has been installed and is running normally.

Log in to the CCE console and click the cluster name to access the cluster console.
In the navigation pane, choose Alarm Center. Then, choose Alarm Rules > Custom Alarm Rules, and click Create Alarm Rule.

Configure the alarm rule parameters.

Rule Type: Select Event alarm. Common events include Kubernetes events and cloud service events.

Rule Details: Configure the parameters listed in the following table.

Parameter	Description	Example Value
Rule Name	Enter the name of the alarm rule.	ReplicaSet quantity change
(Optional) Description	Describe the alarm rule.	The number of ReplicaSets changes more than three times within 5 minutes.
Event Name	Enter the event name based on the actual Kubernetes event or cloud service event. For details about event names, see CCE Events.	ScalingReplicaSet
Triggering Mode	Immediate trigger: An alarm is generated as long as the event occurs. Accumulative trigger: An alarm is generated only after the event is triggered for a preset number of times within the triggering period.	Select Accumulative trigger, and set Monitoring Interval to 5 minutes and Occurrences to > 3.
Severity	Select Critical, Major, Minor, or Warning.	Minor
Contact Group	Select an existing contact group. You can also click Create Contact Group to create a contact group. For details about the parameters, see Configuring Alarm Notification Recipients.	CCEGroup

In the preceding example, an alarm named ReplicaSet quantity change is set for the ScalingReplicaSet event, and its severity is Minor. When the number of ReplicaSet changes more than three times within 5 minutes, a notification is sent to all alarm contacts in the CCEGroup by SMS or email.