Updated on 2024-06-26 GMT+08:00

Enabling Cluster Monitoring

To enable cluster monitoring for a cluster, you need to install the cloud native cluster monitoring add-on that provides metric collection. After cluster monitoring is enabled, Monitoring Center collects cluster metrics and reports them to the AOM instance. This section describes how to enable cluster monitoring.

  • After cluster monitoring is enabled, cluster metrics are reported to AOM instances. Basic metrics are free of charge but custom metrics are charged by AOM. For details, see Pricing Details.
  • Running the cloud native cluster monitoring add-on in a cluster consumes cluster resources. Ensure that there are required cluster resources for installing the add-on. To view resource consumption, go to the cloud native cluster monitoring add-on details page.

Prerequisites

Before enabling cluster monitoring, you need to use an account in the admin user group to delegate CCE and its dependent services.

The authorization dialog box is automatically displayed on the Monitoring Center page. After you confirm the authorization, the system automatically completes the authorization. For details about permission types, see Resource Permissions.

Constraints

  • The cluster version must be v1.17 or later.
  • Before using Monitoring Center, you need to use an account in the admin user group to delegate CCE and its dependent services. After the authorization is complete, users with the CCE Administrator role or CCE FullAccess permission can perform all operations on Monitoring Center. Users with the CCE ReadOnlyAccess permission can view all resource information but cannot perform any operations.
  • On-premises Prometheus or the prometheus add-on (Prometheus (EOM)) is not installed in the cluster.

Enabling Cluster Monitoring

  • Enabling cluster monitoring during cluster purchase
    1. Log in to the CCE console and purchase a cluster.
    2. On the Select Add-on page, select the cloud native cluster monitoring add-on.
    3. On the Add-on Configuration page, select the AOM instance to be interconnected with the cloud native cluster monitoring add-on. If there is no access code, create one first.
      Figure 1 Enabling cluster monitoring
    4. After the cluster is created, create a node on the Nodes tab. After the node is created, the cloud native cluster monitoring add-on will be automatically deployed on the node.
  • Enabling cluster monitoring on the Monitoring Center page
    1. Click the cluster name and choose Monitoring Center in the navigation pane.
    2. Click Enable and select the AOM instance that metrics are reported to.
      Figure 2 Enabling cluster monitoring
    3. Wait for 3 to 5 minutes until the monitoring data is reported to the AOM instance.

      The functions of Monitoring Center are available.

  • Enabling cluster monitoring on the Add-ons page
    1. Click the cluster name and choose Add-ons in the navigation pane.
    2. Select the cloud native cluster monitoring add-on and click Install.
    3. Select Agent or Server mode as required and enable interconnection with AOM so that metrics can be reported to AOM instances.
      Figure 3 Installing the cloud native cluster monitoring add-on
    4. Wait for 3 to 5 minutes until the monitoring data is reported to the AOM instance.

      The functions of Monitoring Center are available.

To disable Monitoring Center, uninstall the cloud native cluster monitoring add-on on the Add-ons page or disable the interconnection with AOM.

FAQ

  • Failed to enable cluster monitoring because the add-on is abnormal.

    Solution: Go to the Add-ons page to view the list of installed add-ons. Click the name of the cloud native cluster monitoring add-on to expand the instance list. Check the events of abnormal pods and locate the fault based on the error information.

    Figure 4 Abnormal add-on
  • There is no data on the Monitoring Center page.

    Solution:

    1. Go to the Add-ons page to view the list of installed add-ons. Click the name of the cloud native cluster monitoring add-on to expand the instance list and check whether the Prometheus instance is running normally. If the Prometheus instance is not running normally, query the events of pods to obtain the exception information.

      For example, if "0/6 nodes are available: 1 Insufficient cpu, 2 node(s) had taint {cie.manage: proxy}, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate" is displayed, the CPU of one node is insufficient and the remaining five nodes are marked with taints. As a result, pods cannot be scheduled.

    2. If the add-on is normal, you can query the logs of the Prometheus instance and check whether the logs contain error information. Error information related to remote_write indicates that metrics fail to be reported. In this case, check whether the network for reporting the metrics is normal.