Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Monitoring Overview

Updated on 2024-01-26 GMT+08:00

CCE works with AOM to comprehensively monitor clusters. When a node is created, the ICAgent (the DaemonSet named icagent in the kube-system namespace of the cluster) of AOM is installed by default. The ICAgent collects monitoring data of underlying resources and workloads running on the cluster. It also collects monitoring data of custom metrics of the workload.

  • Resource metrics

    Basic resource monitoring includes CPU, memory, and disk monitoring. For details, see Resource Metrics. You can view these metrics of clusters, nodes, and workloads on the CCE or AOM console.

  • Custom metrics

    The ICAgent collects custom metrics of applications and uploads them to AOM. For details, see Monitoring Custom Metrics on AOM.

Resource Metrics

On the CCE console, you can view the following metrics.

On the AOM console, you can view host metrics and container metrics.

Viewing Cluster Monitoring Data

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. CCE allows you to view the monitoring data of all nodes. Choose Clusters from the navigation pane. Click the cluster name, and information like CPU Metrics and Memory of all nodes (excluding master nodes) in the last hour, the Status, AZ are displayed.

    Table 1 Cluster monitoring metrics

    Metric

    Description

    CPU Allocation (%)

    A metric indicates the percentage of CPUs allocated to workloads.

    CPU Allocation (%) = Sum of CPU quotas requested by running pods in the cluster/Sum of CPU quotas that can be allocated from all nodes (excluding master nodes) to workloads

    Memory Allocation (%)

    A metric indicates the percentage of memory allocated to workloads.

    Memory Allocation (%) = Sum of memory quotas requested by running pods in the cluster/Sum of memory quotas that can be allocated from all nodes (excluding master nodes) to workloads

    CPU Usage (%)

    A metric indicates the CPU usage of the cluster.

    This metric is the average CPU usage of all nodes (excluding master nodes) in a cluster.

    Memory Usage (%)

    A metric indicates the memory usage of your cluster.

    This metric is the average memory usage of all nodes (excluding master nodes) in a cluster.

    NOTE:

    Allocatable node resources (CPU or memory) = Total amount – Reserved amount – Eviction thresholds. For details, see Node Resource Reservation Policy.

Viewing Monitoring Data of Worker Nodes

CCE also allows you to view monitoring data of a single node.

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. Choose Nodes in the navigation pane. On the right of the page, click Monitor of the target node to view the monitoring data.
  3. You can select statistical Dimension and choose time range to view the monitoring data. The data is provided by AOM. You can view the monitoring data of a node, including the CPU, memory, disk, networking, and GPU.

    Table 2 Node monitoring metrics

    Metric

    Description

    CPU Usage (%)

    A metric indicates the CPU usage of the node.

    CPU Usage (%) = Used CPU cores/Total number of CPU cores

    Used CPU Cores (cores)

    A metric indicates the number of used CPU cores.

    Physical Memory Usage (%)

    A metric indicates the physical memory usage of the node

    Physical Memory Usage (%) = (Physical memory capacity – Available physical memory)/Physical memory capacity

    Available Physical Memory (GiB)

    A metric indicates the unused physical memory of the node.

    Disk Usage (%)

    A metric indicates the disk usage of the file system on the data disk of the node. It is calculated based on the file partition. For details, see Data Disk Space Allocation.

    Disk Usage (%) = (Disk capacity – Available disk space)/Disk capacity

    Available Disk Space (GiB)

    A metric indicates the unused disk space.

    Downlink Rate (BPS) (KB/s)

    A metric indicates the speed at which data is downloaded from the Internet to the node.

    Uplink Rate (BPS) (KB/s)

    A metric indicates the speed at which data is uploaded from the node to the Internet.

    GPU Usage (%)

    A metric indicates the GPU usage of the node.

    GPU Memory Usage (%)

    A metric indicates the percentage of the used GPU memory to the GPU memory capacity.

    GPU Memory Usage (%) = Used GPU memory/GPU memory capacity

    Used GPU Memory (GiB)

    A metric indicates the used GPU memory.

Viewing Workload Monitoring Data

CCE allows you to view monitoring data of a single workload.

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. Choose Workloads in the navigation pane. On the right of the page, click Monitor of the target workload. In the window that slides out from the right, the workload monitoring data is displayed.
  3. You can select statistical Dimension and choose time range to view the monitoring data. The data is provided by AOM. You can view the monitoring data of a workload, including the CPU, memory, networking, and GPU.

    NOTE:

    If there are multiple pods exist in the workload, the monitoring data may vary according to the statistical Dimension. For example, if you select Maximum or Minimum for Dimension, the value of each monitoring data is the maximum or minimum value of all pods under the workload. If Average is selected, the value of each monitoring data is the average value of all pods under the workload.

    Table 3 Workload monitoring metrics

    Metric

    Description

    CPU Usage (%)

    A metric indicates the CPU usage of the workload.

    CPU Usage (%) = Used CPU cores/Total number of CPU cores of all running pods (If no limit is configured, the total number of the node's CPU cores is used.)

    Used CPU Cores (cores)

    A metric indicates the number of used CPU cores.

    Physical Memory Usage (%)

    A metric indicates the physical memory usage of the workload.

    Physical Memory Usage (%) = Used physical memory/Total number of CPU cores of all running pods (If no limit is configured, the total number of the node's CPU cores is used.)

    Used Physical Memory (GiB)

    A metric indicates the amount of the used physical memory.

    Disk Read Rate

    A metric indicates the data volume read from a disk per second. The unit is KB/s.

    Disk Write Rate

    A metric indicates the data volume written to a disk per second. The unit is KB/s.

    Downlink Rate (BPS) (KB/s)

    A metric indicates the speed at which data is downloaded from the Internet.

    Uplink Rate (BPS) (KB/s)

    A metric indicates the speed at which data is uploaded from the node to the Internet.

    GPU Usage (%)

    A metric indicates the GPU usage of the workload.

    GPU Memory Usage (%)

    A metric indicates the percentage of the used GPU memory to the GPU memory capacity.

    GPU Memory Usage (%) = Used GPU memory/GPU memory capacity

    Used GPU Memory (GiB)

    A metric indicates the used GPU memory.

Viewing Pod Monitoring Data

CCE allows you to view the monitoring date of your pods.

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. Choose Workloads in the navigation pane. Then click the workload name of the target workload to list the pods.
  3. Click Monitor of the target pod to view the monitoring data.
  4. You can select statistical Dimension and choose time range to view the monitoring data. The data is provided by AOM. You can view the monitoring data of a pod, including the CPU, memory, disk, networking, and GPU.

    NOTE:

    If there are multiple containers in a single pod, the monitoring data may vary according to the statistical Dimension. For example, if you select Maximum or Minimum for Dimension, the value of each monitoring data is the maximum or minimum value of all containers under the pod. If Average is selected, the value of each monitoring data is the average value of all containers in the pod.

    Table 4 Pod monitoring metrics

    Metric

    Description

    CPU Usage (%)

    A metric indicates the CPU usage of the pod.

    CPU Usage (%) = Used CPU cores/Total number of limited CPU cores of all running containers in the pod (If the limited CPU cores of all running containers are not specified, the number of the node's CPU cores is used.)

    Used CPU Cores (cores)

    A metric indicates the number of used CPU cores.

    Physical Memory Usage (%)

    A metric indicates the physical memory usage of the pod.

    Physical Memory Usage (%) = Used physical memory/Sum of physical memory limits of all running containers in the pod (If not specified, the value of the node's physical memory is used.)

    Used Physical Memory (GiB)

    A metric indicates the amount of the used physical memory.

    Disk Read Rate

    A metric indicates the data volume read from a disk per second. The unit is KB/s.

    Disk Write Rate

    A metric indicates the data volume written to a disk per second. The unit is KB/s.

    Downlink Rate (BPS) (KB/s)

    A metric indicates the speed at which data is downloaded from the Internet.

    Uplink Rate (BPS) (KB/s)

    A metric indicates the speed at which data is uploaded from the node to the Internet.

    GPU Usage (%)

    A metric indicates the GPU usage of the pod.

    GPU Memory Usage (%)

    A metric indicates the percentage of the used GPU memory to the GPU memory capacity.

    GPU Memory Usage (%) = Used GPU memory/GPU memory capacity

    Used GPU Memory (GiB)

    A metric indicates the used GPU memory of the pod.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback