Enabling Cluster Monitoring
To enable monitoring for a cluster, you need to install the Cloud Native Cluster Monitoring add-on for metric collection. After cluster monitoring is enabled, cluster metrics are collected and reported to AOM instances. This section describes how to enable cluster monitoring.
- After cluster monitoring is enabled, cluster metrics are reported to the selected AOM instance. Basic metrics are free but custom metrics are charged by AOM. For details, see Pricing Details.
- Running the Cloud Native Monitoring add-on in a cluster consumes cluster resources. Ensure that there are required cluster resources for installing the add-on. To view resource consumption, go to the add-on details page.
Prerequisites
You have an account in the admin user group to delegate CCE and its dependent services.
The authorization dialog box is automatically displayed on the Monitoring Center page. After you confirm the authorization, the system automatically completes the authorization. For details about permission types, see Resource Permissions.
Notes and Constraints
- The cluster version must be v1.17 or later.
- Before using Monitoring Center, you need to use an account in the admin user group to delegate CCE and its dependent services. After the authorization is complete, users with the CCE Administrator role or CCE FullAccess permission can perform all operations on Monitoring Center. Users with the CCE ReadOnlyAccess permission can view all resource information but cannot perform any operations.
- Self-built Prometheus or the Prometheus add-on (Prometheus (EOM)) is not installed in the cluster.
Enabling Cluster Monitoring
- Enabling cluster monitoring during cluster purchase
- Log in to the CCE console and purchase a cluster.
- On the Select Add-on page, select the Cloud Native Cluster Monitoring add-on.
- On the Add-on Configuration page, select the AOM instance to be interconnected with the add-on. If there is no access code, create one first.
Figure 1 Enabling cluster monitoring
- After the cluster is created, create a node on the Nodes tab. After the node is created, the Cloud Native Cluster Monitoring add-on will be automatically deployed on the node.
- Enabling cluster monitoring on the Monitoring Center page
- Click the cluster name to access the cluster console. In the navigation pane, choose Monitoring Center.
- Click Enable and select the AOM instance that metrics are reported to.
Figure 2 Enabling cluster monitoring
-
Wait for 3 to 5 minutes until the monitoring data is reported to the AOM instance.
The functions of Monitoring Center are available.
- Enabling cluster monitoring on the Add-ons page
To disable cluster monitoring, uninstall the Cloud Native Cluster Monitoring add-on on the Add-ons page or disable the option for interconnecting with AOM.
FAQ
- Failed to enable cluster monitoring because the add-on is abnormal.
Solution: Go to the Add-ons page to view the list of installed add-ons. Click the name of the Cloud Native Cluster Monitoring add-on to expand the instance list. Check the events of abnormal pods and locate the fault based on the error information.
Figure 3 Abnormal add-on
- There is no data on the Monitoring Center page.
Solution:
- Go to the Add-ons page to view the list of installed add-ons. Click the name of the Cloud Native Cluster Monitoring add-on to expand the instance list and check whether the Prometheus instance is running normally. If the Prometheus instance is not running normally, query the events of pods to obtain the exception information.
For example, if "0/6 nodes are available: 1 Insufficient cpu, 2 node(s) had taint {cie.manage: proxy}, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate" is displayed, the CPU of one node is insufficient and the remaining five nodes are marked with taints. As a result, pods cannot be scheduled.
- If the add-on is normal, you can query the logs of the Prometheus instance and check whether the logs contain error information. Error information related to remote_write indicates that metrics fail to be reported. In this case, check whether the network for reporting the metrics is normal.
- Go to the Add-ons page to view the list of installed add-ons. Click the name of the Cloud Native Cluster Monitoring add-on to expand the instance list and check whether the Prometheus instance is running normally. If the Prometheus instance is not running normally, query the events of pods to obtain the exception information.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot