Updated on 2024-11-12 GMT+08:00

Enabling Cluster Monitoring

To enable monitoring for a cluster, you need to install the Cloud Native Cluster Monitoring add-on for metric collection. After cluster monitoring is enabled, cluster metrics are collected and reported to AOM instances. This section describes how to enable cluster monitoring.

  • After cluster monitoring is enabled, cluster metrics are reported to the selected AOM instance. Basic metrics are free but custom metrics are charged by AOM. For details, see Pricing Details.
  • Running the Cloud Native Monitoring add-on in a cluster consumes cluster resources. Ensure that there are required cluster resources for installing the add-on. To view resource consumption, go to the add-on details page.

Prerequisites

You have an account in the admin user group to delegate CCE and its dependent services.

The authorization dialog box is automatically displayed on the Monitoring Center page. After you confirm the authorization, the system automatically completes the authorization. For details about permission types, see Resource Permissions.

Constraints

  • The cluster version must be v1.17 or later.
  • Before using Monitoring Center, you need to use an account in the admin user group to delegate CCE and its dependent services. After the authorization is complete, users with the CCE Administrator role or CCE FullAccess permission can perform all operations on Monitoring Center. Users with the CCE ReadOnlyAccess permission can view all resource information but cannot perform any operations.
  • Self-built Prometheus or the Prometheus add-on (Prometheus (EOM)) is not installed in the cluster.

Enabling Cluster Monitoring

  • Enabling cluster monitoring during cluster purchase
    1. Log in to the CCE console and purchase a cluster.
    2. On the Select Add-on page, select the Cloud Native Cluster Monitoring add-on.
    3. On the Add-on Configuration page, select the AOM instance to be interconnected with the add-on. If there is no access code, create one first.
      Figure 1 Enabling cluster monitoring
    4. After the cluster is created, create a node on the Nodes tab. After the node is created, the Cloud Native Cluster Monitoring add-on will be automatically deployed on the node.
  • Enabling cluster monitoring on the Monitoring Center page
    1. Click the cluster name to access the cluster console. In the navigation pane, choose Monitoring Center.
    2. Check whether self-built Prometheus exists in the cluster. The Cloud Native Cluster Monitoring add-on installed when Monitoring Center is enabled may conflict with self-built Prometheus.

      If your cluster already has self-built Prometheus, you can select Compatibility Mode to enable the compatibility mode. The Cloud Native Cluster Monitoring add-on will be installed in the cce-monitoring namespace and works with self-built Prometheus. However, there are some restrictions on the compatibility mode. For details, see Cloud Native Cluster Monitoring Is Compatible with Self-Built Prometheus.

    3. Click Enable and select the AOM instance that metrics are reported to.
      Figure 2 Enabling cluster monitoring
    4. Wait for 3 to 5 minutes until the monitoring data is reported to the AOM instance.

      The functions of Monitoring Center are available.

  • Enabling cluster monitoring on the Add-ons page
    1. Click the cluster name to access the cluster console. In the navigation pane, choose Add-ons.
    2. Select the Cloud Native Cluster Monitoring add-on and click Install.
    3. Select Reporting Monitoring Data to AOM. The other two data storage configuration items can be selected as needed.
      Figure 3 Installing the Cloud Native Cluster Monitoring add-on
    4. Wait for 3 to 5 minutes until the monitoring data is reported to the AOM instance.

      The functions of Monitoring Center are available.

To disable cluster monitoring, uninstall the Cloud Native Cluster Monitoring add-on on the Add-ons page or disable the option for interconnecting with AOM.

FAQ

  • Failed to enable cluster monitoring because the add-on is abnormal.

    Solution: Go to the Add-ons page to view the list of installed add-ons. Click the name of the Cloud Native Cluster Monitoring add-on to expand the instance list. Check the events of abnormal pods and locate the fault based on the error information.

    Figure 4 Abnormal add-on
  • There is no data on the Monitoring Center page.

    Solution:

    1. Go to the Add-ons page to view the list of installed add-ons. Click the name of the Cloud Native Cluster Monitoring add-on to expand the instance list and check whether the Prometheus instance is running normally. If the Prometheus instance is not running normally, query the events of pods to obtain the exception information.

      For example, if "0/6 nodes are available: 1 Insufficient cpu, 2 node(s) had taint {cie.manage: proxy}, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate" is displayed, the CPU of one node is insufficient and the remaining five nodes are marked with taints. As a result, pods cannot be scheduled.

    2. If the add-on is normal, you can query the logs of the Prometheus instance and check whether the logs contain error information. Error information related to remote_write indicates that metrics fail to be reported. In this case, check whether the network for reporting the metrics is normal.