Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Cluster Monitoring

Updated on 2024-08-05 GMT+08:00

Clusters deployed using CCE are monitored. On the Cluster Monitoring page, you can view multiple basic metrics (such as cluster status, CPU usage, memory usage, and node status), and related alarms and events in real time. Based on them, you can monitor cluster statuses and handle risks in a timely manner, ensuring stable cluster running.

Precautions

The host status can be Normal, Abnormal, Warning, Silent, or Deleted. The running status of a host is displayed as Abnormal when the host is faulty due to network failures or host power-off or shut-down, or when a threshold alarm is reported on the host.

Procedure

  1. Log in to the AOM 2.0 console.
  2. In the navigation pane, choose Container Insights > Cluster Monitoring.
  3. In the upper right corner of the page, set cluster filter criteria.

    1. Set a time range to view the CCE clusters that report information. There are two methods to set a time range:

      Method 1: Use the predefined time label, such as Last hour or Last 6 hours. You can select a time range as required.

      Method 2: Specify the start time and end time to customize a time range. You can specify 30 days at most.

    2. Set the interval for refreshing information. Click and select a value from the drop-down list, such as Refresh manually or 1 minute auto refresh.

  4. Set search criteria (such as the creation time, CPU usage, and cluster name) to find the target cluster.
  1. Click a cluster to go to its details page. In the navigation pane on the left, monitor cluster running conditions by cluster, dashboard, or alarm.

    • View information about nodes, workloads, pods (container groups), and containers by cluster.
      • In the navigation pane on the left, choose Insights > Node to view information about all nodes in the cluster in real time, including the status, IP address, pod status, CPU usage, and memory usage.
        • In the upper part of the node list, filter nodes by node name.
        • Click in the upper right corner and select or deselect options as required.
        • Click a node to view its related resources, alarms, and events, and common system devices such as GPUs and NICs.
          • On the Overview tab page, Cloud-Native Monitoring (New) is selected by default. You can view metrics such as CPU, memory, and network. Click Using ICAgent (Old) and select a target Prometheus instance from the drop-down list. You can view metrics such as CPU, physical memory, and host status.
            NOTE:

            To use cloud-native monitoring, connect your cluster to a Prometheus instance for CCE first.

            If there is no Prometheus instance for CCE, click Prometheus Monitoring to create a Prometheus instance by referring to Prometheus Instance for CCE. After the instance is created, click its name. On the instance details page, choose Integration Center and then connect the CCE cluster.

            Click in the upper right corner and select a predefined time label or customize a time range from the drop-down list to view resource information.

            Click in the upper right corner to obtain the latest resource information in real time.

            Click in the upper right corner of the page to view resource information in full screen.

          • On the Related Resources tab page, the pod (container group) to which the node belongs is displayed.
      • In the navigation pane on the left, choose Insights > Workload to view the status and resource usage of all workloads in the cluster.
        • In the upper part of the workload list, filter workloads by workload name.
        • Click in the upper right corner and select or deselect options as required.
        • Click a workload to view its related resources, alarms, events, and dashboards.
          • On the Overview tab page, Cloud-Native Monitoring (New) is selected by default. You can view metrics such as CPU, memory, and network. Click Using ICAgent (Old) and select a target Prometheus instance from the drop-down list. You can view metrics such as CPU, physical memory, and file system.
          • On the Related Resources tab page, the pod (container group) to which the workload belongs is displayed.
      • In the navigation pane on the left, choose Insights > Pod to view the status and resource usage of all pods in the cluster.
        • In the upper part of the container group list, filter container groups by name.
        • Click in the upper right corner and select or deselect options as required.
        • Click a container group to view its related resources, alarms, events, and dashboards.
          • On the Overview tab page, Cloud-Native Monitoring (New) is selected by default. You can view metrics such as CPU, memory, and network. Click Using ICAgent (Old) and select a target Prometheus instance from the drop-down list. You can view metrics such as CPU, physical memory, and file system.
          • On the Related Resources tab page, view nodes, workloads, and containers by name.
      • In the navigation pane on the left, choose Insights > Container to view the status and resource usage of all containers in the cluster.
        • In the upper part of the container list, filter containers by name.
        • Click in the upper right corner and select or deselect options as required.
        • Click a container to view its related resources, alarms, events, and dashboards. On the Related Resources tab page, the container group to which the container belongs is displayed by default. View nodes, workloads, and container groups by name.
    • View the cluster running status from the alarm management perspective.
      • In the navigation pane on the left, choose Alarm Management > Alarm List to view alarm details of the cluster. For details, see Viewing Alarms.
      • In the navigation pane on the left, choose Alarm Management > Event List to view event details of the cluster. For details, see Viewing Events.
      • In the navigation pane on the left, choose Alarm Management > Alarm Rules to view the alarm rules related to the cluster. Modify the alarm rules as required. For details, see Managing Alarm Rules.
    • In the navigation pane on the left, choose Dashboard to view the running status of the current cluster.
      • A CCE Prometheus instance has been connected:

        Select Cluster View, Pod View, Host View, or Node View from the drop-down list to view key metrics such as the CPU usage and physical memory usage.

      • No CCE Prometheus instance is connected:

        Choose Prometheus Monitoring and then add a Prometheus instance. For details, see Prometheus Instance for CCE After the instance is created, click its name. On the instance details page, choose Integration Center and then connect the CCE cluster.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback