Updated on 2024-06-26 GMT+08:00

Cluster Monitoring

To observe the resource usage and health of a cluster, choose Monitoring Center > Clusters. The monitoring data is displayed, where you can view the Cluster Health, Health Overview, Top Resource Consumption Statistics, and Data Plane Monitoring.

Navigation Path

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. In the navigation pane on the left, choose Monitoring Center. Then, click Clusters.

Cluster Health

Cluster health is evaluated from several dimensions, such as the health score, number of risk items to be processed, risk level, and proportion of diagnosed risk items for master nodes, clusters, worker nodes, workloads, and external dependencies. Abnormal data is displayed in red. For more diagnosis results, go to Health Center.

Figure 1 Cluster health

Health Overview

Resource Overview

Resource Overview displays the percentage of abnormal resources in nodes, workloads, and pods and the total number of namespaces.

Control Plane Health Overview

Control Plane Health Overview displays the percentage of exceptions on control plane components and master nodes, total QPS of the API server, and request error rate of the API server. If the API server (the API service provider of the cluster) on the control plane is abnormal, the cluster may fail to be accessed, and workloads that depend on the API server may fail to run normally. The QPS and request error rate help you quickly identify and rectify faults.

Figure 2 Health overview

Top Resource Consumption Statistics

CCE collects statistics on top 5 nodes, Deployments, StatefulSets, and pods by CPU and memory usages, helping you identify high resource consumption. To view all data, click the nodes, workloads, or pods tab.

Figure 3 Top Resource Consumption Statistics

Monitoring metrics

  • CPU Usage

    Node CPU usage = Average percentage of the non-idle CPU time of the node

    Workload CPU usage = Average CPU usage in each pod of the workload

    Pod CPU usage = The used CPU cores/The sum of all CPU limits of the pods (If not specified, all node CPU cores are used.)

  • Memory Usage

    Node memory usage = The used memory of the node/The total memory of the node

    Workload memory usage = Average memory usage in each pod of the workload

    Pod memory usage = The used physical memory/The sum of all memory limits of pods (If not specified, all node memory is used.)

Data Plane Monitoring

By default, the resource usage is collected from each dimension in the last hour, last 8 hours, and last 24 hours. To view more monitoring information, click View All Metrics to access the Dashboard page. For details, see Using Dashboard.

You can hover over a chart to view the monitoring data in each minute.

  • CPU: the CPU used by a cluster in a specified period.
  • Memory: the memory used by a cluster in a specified period.
  • PVC Storage Status: the binding between PVCs and PVs.
  • Pod Status and Quantity: real-time status and number of pods in a cluster.
  • Trend of Total Pod Restarts: the total number of pod restarts in the cluster in the last 5 minutes.
  • Node Status Trend: real-time status of nodes in a cluster.