Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Cloud Container Engine/ User Guide/ O&M/ Alarm Center/ Configuring Custom Alarms on AOM

Configuring Custom Alarms on AOM

Updated on 2024-09-30 GMT+08:00

CCE interworks with AOM to report alarms and events. By setting alarm rules on AOM, you can check whether resources in clusters are normal in a timely manner.

Process

  1. Creating a Topic on SMN
  2. Creating an Action Rule
  3. Adding an Alarm Rule
    1. Event alarms: Generate alarms based on the events reported by clusters to AOM. For details about the events and configurations, see Adding an Event Alarm.
    2. Metric alarms: Generate alarms based on the thresholds of monitoring metrics, such as resource utilization of servers and components. For details about the metric thresholds and configurations, see Adding a Metric Alarm.

Creating a Topic on SMN

Simple Message Notification (SMN) pushes messages to subscribers through emails, SMS messages, and HTTP/HTTPS requests.

A topic is used to publish messages and subscribe to notifications. It serves as a message transmission channel between publishers and subscribers.

You need to create a topic and add a subscription to it. For details, see Creating a Topic and Adding a Subscription to a Topic.

NOTE:

After subscribing to a topic, confirm the subscription in the email or SMS message for the notification to take effect.

Creating an Action Rule

AOM allows you to customize alarm action rules. You can create an alarm action rule to associate an SMN topic with a message template. You can also customize notification content based on a message template.

For details, see Creating an Alarm Action Rule. When creating an action rule, select the topic that is created and subscribed to in Creating a Topic on SMN.

Adding an Event Alarm

The following uses NodeNotReady as an example to describe how to add an event alarm. You can add other alarms by referring to Table 1.

Table 1 Event-based alarms

Event Name

Source

Description

Solution

NodeNotReady

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node.

Rebooted

CCE

An alarm is triggered immediately when a node is restarted.

Log in to the cluster to check the status of the node for which the alarm is generated, check whether the node can be started properly, and locate the cause of the restart.

KUBELETIsDown

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. Then, restart kubelet.

DOCKERIsDown

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node. Then, restart Docker.

KUBEPROXYIsDown

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node.

KernelOops

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node.

ConntrackFull

CCE

An alarm is triggered immediately when a node is abnormal.

Log in to the cluster and check the status of the node for which the alarm is generated. Set the node as unschedulable and schedule the service pods to another node.

NodePoolSoldOut

CCE

An alarm is triggered immediately when node pool resources are sold out.

Set auto node pool switchover or change the node pool specifications.

NodeCreateFailed

CCE

An alarm is triggered immediately upon a node creation failure.

Rectify the failure and create the node again.

ScaleUpTimedOut

CCE

An alarm is triggered immediately upon node scale-out timeout.

Rectify the failure and try scale-out again.

ScaleDownFailed

CCE

An alarm is triggered immediately upon node scale-in timeout.

Rectify the failure and try scale-in again.

BackOffPullImage

CCE

Image pull retry failed.

Log in to the cluster, locate the failure cause, and deploy the service workload again.

  1. Log in to the AOM 2.0 console.
  2. In the navigation pane, choose Alarm Management > Alarm Rules. Then, click Create Alarm Rule.
  3. Enter basic information as prompted and configure other parameters as follows:

    For details about parameters, see Creating an Event Alarm Rule.

    • Rule Type: Select Event alarm rule.
    • Event Type: Select System.
    • Event Source: Select CCE.
    • Monitored Object: Filter monitored objects by notification type, event name, alarm severity, custom attribute, namespace, and cluster name.

      In this example, filter monitored objects by event name, select NodeNotReady, and set Trigger Mode to Immediate Trigger.

    • Alarm Mode: Select Direct alarm reporting.
    • Action Rule: Select the action rule created in Creating an Action Rule.

    Configure other parameters as required.

    In this example, the alarm settings are as follows:

    If a node in the cluster becomes abnormal, CCE reports the NodeNotReady event to AOM. AOM immediately notifies you through SMN based on the action rule.

    Figure 1 Adding an event alarm

  4. Click Confirm.

    A successfully created alarm rule will be displayed in the rule list.

Adding a Metric Alarm

The following uses a PromQL statement as an example to describe how to add a metric alarm.

  1. Log in to the AOM 2.0 console.
  2. In the navigation pane, choose Alarm Management > Alarm Rules. Then, click Create Alarm Rule.
  3. Configure parameters as follows:

    For details about parameters, see Creating a Metric Alarm Rule.

    • Rule Type: Select Metric alarm rule.
    • Configuration Mode: Select PromQL. You can enter native PromQL statements or use CCE templates.
    • Prometheus Instance: Select the AOM instance whose metrics are reported by Cloud Native Cluster Monitoring in the cluster.
    • Default Rule:
      • Custom: Enter a PromQL statement to configure the alarm rule. For example:
        kube_persistentvolume_status_phase{phase=~"Failed|Pending",cluster="${cluster_id}"} > 0

        ${cluster_id} indicates the cluster name. If a PV in the cluster is in the Failed or Pending state, an alarm will be generated.

      • CCEFromProm: Select an alarm template provided by CCE.
        Figure 2 Adding a metric alarm
    • Alarm Mode: Select Direct alarm reporting.
    • Action Rule: Select the action rule created in Creating an Action Rule.

    Configure other parameters as required.

  4. Click Confirm.

    A successfully created alarm rule will be displayed in the rule list.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback