Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Cloud Container Engine_Autopilot/ User Guide/ O&M/ Alarm Center/ Configuring Alarms in Alarm Center

Configuring Alarms in Alarm Center

Updated on 2025-02-27 GMT+08:00

By using AOM, Alarm Center can promptly detect cluster faults and generate alarms for service stability. Alarm Center provides built-in alarm rules, which can free you from manually configuring alarm rules on AOM. These rules are established based on the extensive cluster O&M experience of our Huawei Cloud container team and can cover container service exceptions, key metric alarms of basic cluster resources, and metric alarms of applications in a cluster to meet your routine O&M requirements.

Constraints

Only Huawei Cloud accounts, HUAWEI IDs, or IAM users with CCE administrator or FullAccess permissions can perform all operations using Alarm Center. IAM users with the CCE ReadOnlyAccess permission can only view all resources.

Enabling Alarm Center

  1. Click the cluster name to access the cluster console. In the navigation pane on the left, choose Alarm Center.
  2. On the Alarm Rules tab, click Enable Alarm Center. In the window that slides out from the right, select one or more contact groups to manage subscription endpoints and receive alarm messages by group. If no contact group is available, create one by referring to Binding Contact Groups.
  3. Click OK.

    NOTE:

    Metric alarm rules can be created in Alarm Center only after the Cloud Native Cluster Monitoring add-on is installed and the AOM Prometheus instance is interconnected. For details about how to enable Monitoring Center, see Enabling Cluster Monitoring.

    Event alarms in Table 1 can be reported only when Kubernetes event collection is enabled in Logging. For details, see Collecting Kubernetes Events.

Configuring Alarm Rules

After Alarm Center is enabled for clusters, you can configure and manage alarm rules.

  1. Log in to the CCE console.
  2. On the cluster list page, click the name of the target cluster to go to the details page.
  3. In the navigation pane on the left, choose Alarm Center. Then, click the Alarm Rules tab and configure and manage alarm rules.

    By default, Alarm Center generates alarm rules for containers. The rules are intended for alarms including event alarms and metric alarms for exceptions. Alarm rules are classified into several sets. You can associate an alarm rule set with multiple contact groups and enable or disable alarm items. An alarm rule set consists of multiple alarm rules. An alarm rule corresponds to the check items for a single exception. Table 1 lists default alarm rules.

Table 1 Default alarm rules

Rule Type

Alarm Item

Description

Alarm Type

Dependency Item

PromQL/Event Name

Load rule set

Abnormal pod

Check whether the pod is running normally.

Metric

Cloud Native Cluster Monitoring

sum(min_over_time(kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}[10m]) and count_over_time(kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}[10m]) > 18 )by (namespace,pod, phase, cluster_name, cluster) > 0

Frequent pod restarts

Check whether the pod frequently restarts.

Metric

Cloud Native Cluster Monitoring

increase(kube_pod_container_status_restarts_total[5m]) > 3

Unexpected number of Deployment replicas

Check whether the number of Deployment replicas is the same as the expected value.

Metric

Cloud Native Cluster Monitoring

(kube_deployment_spec_replicas != kube_deployment_status_replicas_available ) and ( changes(kube_deployment_status_replicas_updated[5m]) == 0)

Unexpected number of StatefulSet replicas

Check whether the number of StatefulSet replicas is the same as the expected value.

Metric

Cloud Native Cluster Monitoring

(kube_statefulset_status_replicas_ready != kube_statefulset_status_replicas) and (changes(kube_statefulset_status_replicas_updated[5m]) == 0)

Container CPU usage higher than 80%

Check whether the container CPU usage is higher than 80%.

Metric

Cloud Native Cluster Monitoring

100 * (sum(rate(container_cpu_usage_seconds_total{image!="", container!="POD"}[1m])) by (cluster_name,pod,node,namespace,container, cluster) / sum(kube_pod_container_resource_limits{resource="cpu"}) by (cluster_name,pod,node,namespace,container, cluster)) > 80

Container memory usage higher than 80%

Check whether the container memory usage is higher than 80%.

Metric

Cloud Native Cluster Monitoring

(sum(container_memory_working_set_bytes{image!="", container!="POD"}) BY (cluster_name, node,container, pod , namespace, cluster) / sum(container_spec_memory_limit_bytes > 0) BY (cluster_name, node, container, pod , namespace, cluster) * 100) > 80

Abnormal container

Check whether the container is running normally.

Metric

Cloud Native Cluster Monitoring

sum by (namespace, pod, container, cluster_name, cluster) (kube_pod_container_status_waiting_reason) > 0

UpdateLoadBalancerFailed

Check whether a load balancer is updated.

Event

Cloud Native Log Collection

N/A

Pod OOM

Check whether an OOM occurs in the pod.

Event

CCE Node Problem Detector

Cloud Native Log Collection

PodOOMKilling

Cluster status rule set

Unavailable cluster

Check whether the cluster is available.

Event

Cloud Native Log Collection

N/A

Binding Contact Groups

NOTE:

An alarm rule set can be bound to a maximum of five contact groups.

A contact group, backed on Simple Message Notification, enables message publishers and subscribers to contact each other. A contact group contains one or more terminals. You can bind an alarm rule to a contact group to manage terminals that have subscribed to alarm messages.

  1. Log in to the CCE console.
  2. On the cluster list page, click the name of the target cluster to go to the details page.
  3. In the navigation pane on the left, choose Alarm Center. Then, click the Default Contact Groups tab.
  4. Click Bind Contact Group. You can select a contact group created in SMN or create a contact group. The parameters for creating a contact group are described as follows:

    • Contact Group Name: Enter the name of the contact group, which cannot be changed after the contact group is created. The name can contain 1 to 255 characters and must start with a letter or digit. Only letters, digits, hyphens (-), and underscores (_) are allowed.
    • Alarm Message Display Name: Enter the title of the message received by the specified subscription endpoint. For example, if you set endpoint Type to Email and specify a display name, the name you specified will be displayed as the alarm message sender. If no alarm message display name is specified, the sender will be username@example.com. The alarm message display name can be changed after a contact group is created.
    • Add Subscription endpoint: Add one or more endpoints to receive alarm messages. The endpoint type can be SMS or Email. If you select SMS, enter a valid mobile number. If you select Email, enter a valid email address.

  5. Click OK.

    You will be redirected to the contact group list. The subscription endpoint is in the Unconfirmed state. Send a subscription request to the endpoint to verify its validity.

  6. Click Request Confirmation in the Operation column to send a subscription request to the endpoint. After the endpoint receives and confirms the request, the subscription endpoint status changes to Confirmed.

Viewing Alarms

You can view the latest historical alarms on the Alarm list tab.

  1. Log in to the CCE console.
  2. On the cluster list page, click the name of the target cluster to go to the details page.
  3. In the navigation pane on the left, choose Alarm Center. Then, click the Alarms tab.

    By default, all alarms to be cleared are displayed in the list. You can query alarms by alarm keyword, alarm severity, or alarm time. In addition, you can view the distribution of alarms that meet the specified criteria in different periods.

    If you confirm that an alarm has been handled, click Clear in the Operation column. After the alarm is cleared, you can view it in the historical alarm list.

    Figure 1 Querying alarms

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback