Cluster Monitoring
Clusters deployed using CCE are monitored. On the Cluster Monitoring page, you can view multiple basic metrics (such as cluster status, CPU usage, memory usage, and node status), and related alarms and events in real time. Based on them, you can monitor cluster statuses and handle risks in a timely manner, ensuring stable cluster running.
Precautions
- The host status can be Normal, Abnormal, Warning, Silent, or Deleted. The running status of a host is displayed as Abnormal when the host is faulty due to network failures or host power-off or shut-down, or when a threshold alarm is reported on the host.
Procedure
- Log in to the AOM 2.0 console.
- In the navigation pane, choose Infrastructure Monitoring > Cluster Monitoring.
- In the upper right corner of the page, set cluster filter criteria.
- Set a time range to view the CCE clusters that report information. There are two methods to set a time range:
Method 1: Use a predefined time label, such as Last hour or Last 6 hours. You can select a time range as required.
Method 2: Specify the start time and end time to customize a time range. You can specify 30 days at most.
- Set the interval for refreshing information. Click and select a value from the drop-down list, such as Refresh manually or 1 minute auto refresh.
- Set a time range to view the CCE clusters that report information. There are two methods to set a time range:
- Set search criteria (such as the creation time, CPU usage, and cluster name) to find the target cluster.
- Click a cluster to go to its details page.
- Choose Monitoring Center, Logging, or Alarm Center in the navigation pane on the left to implement cloud native observability for clusters. (This function is not available in AF-Johannesburg only.)
- Health Center
Health diagnosis monitors cluster health by leveraging container O&M experts' experience to quickly detect cluster faults and identify risks. It also provides rectification suggestions. For details, see Health Center.
- Monitoring Center
Monitoring Center provides the container insights, health diagnosis, and dashboard. The container insights function provides monitoring views from dimensions such as cluster, node, workload, and pod. It supports multi-level drill-down and association analysis. The dashboard gives you monitoring graphs for items such as kube-apiserver, CoreDNS, and PVC. For details, see Monitoring Center.
- Logging
CCE works with Log Tank Service (LTS) to collect logs of control plane components (kube-apiserver, kube-controller-manager, and kube-scheduler), Kubernetes audit logs, Kubernetes events, and container logs (standard output logs, text logs, and node logs). For details, see Logging.
- Alarm Center
Alarm Center works with AOM 2.0 to allow you to create alarm rules and check alarms of clusters and containers. For details, see Alarm Center.
- Health Center
- In the navigation pane on the left, monitor cluster running conditions by cluster, on dashboards, or through Alarm Management. For details, see 6. (This function is available in AF-Johannesburg only.)
- Choose Monitoring Center, Logging, or Alarm Center in the navigation pane on the left to implement cloud native observability for clusters. (This function is not available in AF-Johannesburg only.)
- Click a cluster to go to its details page. In the navigation pane on the left, monitor cluster running conditions by cluster, on dashboards, or through Alarm Management.
This function is available in AF-Johannesburg only.
- View information about nodes, workloads, pods (container groups), and containers by cluster.
- In the navigation pane on the left, choose Insights > Node to view information about all nodes in the cluster in real time, including the status, IP address, pod status, CPU usage, and memory usage.
- In the upper part of the node list, filter nodes by node name.
- Click in the upper right corner and select or deselect options as required.
- Click a node to view its related resources, alarms, and events, and common system devices such as GPUs and NICs.
- On the Overview tab page, Cloud-Native Monitoring (New) is selected by default. You can view metrics such as CPU, memory, and network. Click Using ICAgent (Old) and select a target Prometheus instance from the drop-down list. You can view metrics such as CPU, physical memory, and host status.
To use cloud-native monitoring, connect your cluster to a Prometheus instance for CCE first.
If there is no Prometheus instance for CCE, click Prometheus Monitoring to create a Prometheus instance by referring to Prometheus Instance for CCE. After the instance is created, click its name. On the instance details page, choose Integration Center and then connect the CCE cluster.
Click in the upper right corner and select a predefined time label or customize a time range from the drop-down list to view resource information.
Click in the upper right corner to obtain the latest resource information in real time.
Click in the upper right corner of the page to view resource information in full screen.
- On the Related Resources tab page, the pod (container group) to which the node belongs is displayed.
- On the Overview tab page, Cloud-Native Monitoring (New) is selected by default. You can view metrics such as CPU, memory, and network. Click Using ICAgent (Old) and select a target Prometheus instance from the drop-down list. You can view metrics such as CPU, physical memory, and host status.
- In the navigation pane on the left, choose Insights > Workload to view the status and resource usage of all workloads in the cluster.
- In the upper part of the workload list, filter workloads by workload type or name.
- Click in the upper right corner and select or deselect options as required.
- Click a workload to view its related resources, alarms, events, and dashboards.
- On the Overview tab page, Cloud-Native Monitoring (New) is selected by default. You can view metrics such as CPU, memory, and network. Click Using ICAgent (Old) and select a target Prometheus instance from the drop-down list. You can view metrics such as CPU, physical memory, and file system.
- On the Related Resources tab page, the pod (container group) to which the workload belongs is displayed.
- In the navigation pane on the left, choose Insights > Pod to view the status and resource usage of all pods in the cluster.
- In the upper part of the container group list, filter container groups by name.
- Click in the upper right corner and select or deselect options as required.
- Click a container group to view its related resources, alarms, events, and dashboards.
- On the Overview tab page, Cloud-Native Monitoring (New) is selected by default. You can view metrics such as CPU, memory, and network. Click Using ICAgent (Old) and select a target Prometheus instance from the drop-down list. You can view metrics such as CPU, physical memory, and file system.
- On the Related Resources tab page, view nodes, workloads, and containers by name.
- In the navigation pane on the left, choose Insights > Container to view the status and resource usage of all containers in the cluster.
- In the upper part of the container list, filter containers by name.
- Click in the upper right corner and select or deselect options as required.
- Click a container to view its related resources, alarms, events, and dashboards. On the Related Resources tab page, the container group to which the container belongs is displayed by default. Check nodes, workloads, and container groups by name.
- In the navigation pane on the left, choose Insights > Node to view information about all nodes in the cluster in real time, including the status, IP address, pod status, CPU usage, and memory usage.
- Check the cluster running status through Alarm Management.
- In the navigation pane on the left, choose Alarm Management > Alarm List to view alarm details of the cluster. For details, see Checking Alarms.
- In the navigation pane on the left, choose Alarm Management > Event List to view event details of the cluster. For details, see Viewing Events.
- In the navigation pane on the left, choose Alarm Management > Alarm Rules to view the alarm rules related to the cluster. Modify the alarm rules as required. For details, see Managing Alarm Rules.
- In the navigation pane on the left, choose Dashboard to view the running status of the current cluster.
- A CCE Prometheus instance has been connected:
Select Cluster View, Pod View, Host View, or Node View from the drop-down list to view key metrics such as the CPU usage and physical memory usage.
- No CCE Prometheus instance is connected:
Choose Prometheus Monitoring and then add a Prometheus instance. For details, see Prometheus Instance for CCE After the instance is created, click its name. On the instance details page, choose Integration Center and then connect the CCE cluster.
- A CCE Prometheus instance has been connected:
- View information about nodes, workloads, pods (container groups), and containers by cluster.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot