Updated on 2024-06-26 GMT+08:00

Node Monitoring

To monitor the resource usage of nodes, choose Monitoring Center > Nodes. This page provides information about all nodes in a cluster and monitoring data of a single node, such as the CPU/memory usage, network inbound/outbound rate, and disk I/O read/write rate.

Navigation Path

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. In the navigation pane on the left, choose Monitoring Center. Then, click Nodes.

    This page displays information about all nodes. To view the monitoring data of a node, click the node name to access its Overview tab and switch to the Pods or Monitoring tab.

Nodes

This page lists the name, status, IP address, number of pods (allocated/total), CPU request/limit/usage, and memory request/limit/usage of each node.

Figure 1 Nodes

You can search for the desired node by name, status, private IP address, or public IP address. You can click Export to export data of all nodes or selected nodes. The exported file is in .xlsx format, and the file name contains the timestamp.

If the CPU limit or memory limit of a node exceeds 100%, the node resources are overcommitted and the sum of workload limits (maximum available values) of the node exceeds the node specifications. If workloads require too many resources, they may preempt resources, causing service exceptions or even node exceptions.

Overview

You can click the node name to view the resource health overview, such as the node status, number of pods, and abnormal events. You can also view the monitoring overview of the last hour, including the CPU usage, memory usage, and network inbound/outbound rate.

Figure 2 Resource overview and monitoring overview

The Overview tab also shows the pod usage trend. You can switch the metrics in the upper right corner of the chart to view the CPU usage, used CPUs, memory usage, and used memory of each pod on the node. You can also click Top 5 (Descending) or Top 5 (Ascending) in the upper left corner to view the top 5 data in descending or ascending order.

Figure 3 Pod usage trend

For more metrics, go to the Monitoring tab.

Pods

This tab lists the name, status, namespace, IP address, node, number of restarts, CPU request/limit, memory request/limit, used CPU cores, CPU usage, used memory, memory usage of each pod.

Figure 4 Pods

You can search for the desired pod by name, status, namespace, IP address, or node. You can click Export to export data of all pods or selected pods. The exported file is in .xlsx format, and the file name contains the timestamp.

You can click the name of a pod to view its monitoring data. For more information, see Pod Monitoring.

Monitoring

This tab shows the resource usage of the node in each dimension in the last 1 hour, last 8 hours, last 24 hours, or a custom period. To view more monitoring information, click View Dashboard to access the Dashboard page. For details, see Using Dashboard.

Figure 5 Node monitoring
  • CPU Metrics
    • CPU usage: Average percentage of the non-idle CPU time of the node
    • CPU allocation ratio: the percentage of CPU cores requested by all containers on the node to the total CPU cores on the node.
    • Single-core CPU usage: the percentage of the non-idle time of each CPU core on the node.
  • Memory Metrics
    • Memory usage: the percentage of memory used by the node.
    • Memory allocation ratio: the percentage of memory requested by all containers on the node to the total memory on the node.
  • Networking Metrics
    • Outbound rate: the number of bytes sent by the NIC on the node per second in different time periods.
    • Inbound rate: the number of bytes received by the NIC on the node per second in different time periods.
    • Packet loss rate (transmit): the percentage of packets not received by the recipient to packets sent from the NIC of the node.
    • Packet loss rate (receive): the percentage of packets not received by the NIC of the node to packets sent to the NIC.
  • Disk Metrics
    • Disk read rate: the number of bytes read from each file system on the node per second in different time periods.
    • Disk write rate: the number of bytes written to each file system on the node per second in different time periods.
    • Disk usage: the percentage of used disk space of each file system on the node in different time periods.
  • Pod Metrics
    • Pod CPU usage: the percentage of CPU used by each pod on the node in different time periods to the CPU limit for each pod.
    • Pod memory usage: the percentage of memory used by each pod on the node in different time periods to the memory limit for each pod.
    • Pod status and quantity: the total number of pods in the Unavailable, Unready, Running, Completed, or Other state on the node in different time periods.
    • Pod quantity trend: the number of pods on the node in different time periods.
  • Other Metrics
    • Average node load: the average number of running processes on the node in a specified period. This metric is used to check whether the number of processes running on the node reaches its processing capability. Generally, it should be kept within a reasonable range for stability and reliability of the node.
    • iptables connections: the maximum number of entries and the number of allocated entries in the connection tracking table.