Using AOM to Monitor Clusters

Clusters deployed using CCE are monitored. Through cluster monitoring, you can view multiple basic metrics (such as cluster status, CPU usage, memory usage, and node status), and related alarms and events in real time. Based on them, you can monitor cluster statuses and handle risks in a timely manner, ensuring stable cluster running.

Constraints

The host status can be Normal, Abnormal, Warning, Silent, or Deleted. The running status of a host is displayed as Abnormal when the host is faulty due to network failures or host power-off or shut-down, or when a threshold alarm is reported on the host.
To use CCE functions on the AOM console, you need to obtain CCE permissions in advance. For details, see Permissions.

Procedure

Log in to the AOM 2.0 console.
In the navigation pane, choose Infrastructure Monitoring > Container Insights > Cluster Monitoring.
In the upper right corner of the page, set cluster filter criteria.
1. Set a time range to check the CCE clusters reported. You can use a predefined time label, such as Last hour and Last 6 hours, or customize a time range. Max.: 30 days.
2. Set the interval for refreshing information. Click and select a value from the drop-down list, such as Refresh manually or 1 minute auto refresh.
Set search criteria such as the cluster name to filter the target cluster. You can also sort clusters by creation time, CPU usage, or memory usage.

If the node or pod status of the cluster is normal, their numbers are displayed in green.
Click a cluster to go to its details page.
- Choose Health Center, Monitoring Center, Logging, or Alarm Center in the navigation pane on the left to implement cloud native observability for clusters.
  - Health Center
    Health diagnosis monitors cluster health by leveraging container O&M experts' experience to quickly detect cluster faults and identify risks. It also provides rectification suggestions. For details, see Health Center.
  - Monitoring Center
    Monitoring Center provides the container insights, health diagnosis, and dashboard. The container insights function provides monitoring views from dimensions such as cluster, node, workload, and pod. It supports multi-level drill-down and association analysis. The dashboard gives you monitoring graphs for items such as kube-apiserver, CoreDNS, and PVC. For details, see Monitoring Center.
  - Logging
    CCE works with Log Tank Service (LTS) to collect logs of control plane components (kube-apiserver, kube-controller-manager, and kube-scheduler), Kubernetes audit logs, Kubernetes events, and container logs (standard output logs, text logs, and node logs). For details, see Logging.
  - Alarm Center
    Alarm Center works with AOM 2.0 to allow you to create alarm rules and check alarms of clusters and containers. For details, see Alarm Center.