Cluster Monitoring
To observe the resource usage and health of a cluster, choose Monitoring Center > Clusters. The monitoring data is displayed, where you can view the Cluster Health, Health Overview, Top Resource Consumption Statistics, and Data Plane Monitoring.
Navigation Path
- Log in to the CCE console and click the cluster name to access the cluster console.
- In the navigation pane, choose Monitoring Center. Then, click Clusters.
Cluster Health
Cluster health is evaluated from several dimensions, such as the health score, number of risk items to be processed, risk level, and proportion of diagnosed risk items for master nodes, clusters, worker nodes, workloads, and external dependencies. Abnormal data is displayed in red. For more diagnosis results, go to Health Center.
Health Overview
Resource Overview
Resource Overview displays the percentage of abnormal resources in nodes, workloads, and pods and the total number of namespaces.
Control Plane Health Overview
Control Plane Health Overview displays the percentage of exceptions on control plane components and master nodes, total QPS of the API server, and request error rate of the API server. If the API server (the API service provider of the cluster) on the control plane is abnormal, the cluster may fail to be accessed, and workloads that depend on the API server may fail to run normally. The QPS and request error rate help you quickly identify and rectify faults.
Top Resource Consumption Statistics
CCE collects statistics on top 5 nodes, Deployments, StatefulSets, and pods by CPU and memory usages, helping you identify high resource consumption. To view all data, click the nodes, workloads, or pods tab.
Monitoring metrics
- CPU Usage
Node CPU usage = Average percentage of the non-idle CPU time of the node
Workload CPU usage = Average CPU usage in each pod of the workload
Pod CPU usage = The used CPU cores/The sum of all CPU limits of the pods (If not specified, all node CPU cores are used.)
- Memory Usage
Node memory usage = The used memory of the node/The total memory of the node
Workload memory usage = Average memory usage in each pod of the workload
Pod memory usage = The used physical memory/The sum of all memory limits of pods (If not specified, all node memory is used.)
Data Plane Monitoring
By default, the resource usage is collected from each dimension in the last hour, last 8 hours, and last 24 hours. To view more monitoring information, click View All Metrics to access the Dashboard page. For details, see Using Dashboard.
You can hover over a chart to view the monitoring data in each minute.
- CPU: the CPU used by a cluster in a specified period.
- Memory: the memory used by a cluster in a specified period.
- PVC Storage Status: the binding between PVCs and PVs.
- Pod Status and Quantity: real-time status and number of pods in a cluster.
- Trend of Total Pod Restarts: the total number of pod restarts in the cluster in the last 5 minutes.
- Node Status Trend: real-time status of nodes in a cluster.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot