Updated on 2024-06-26 GMT+08:00

Monitoring Center FAQ

Why Is There No Data on Monitoring Center?

  • Possible cause 1: Cloud Native Cluster Monitoring is abnormal.

    Access the Add-ons page of the cluster console and check whether Cloud Native Cluster Monitoring is in the Running state.

    Figure 1 Checking the add-on status

    If the add-on is not running normally, locate the fault based on the events.

    Figure 2 Viewing add-on events
  • Possible cause 2: The AOM instance interconnected with Cloud Native Cluster Monitoring is deleted.

    On the Add-ons page of the cluster console, check the configuration of Cloud Native Cluster Monitoring.

    Figure 3 Editing add-on configuration

    Ensure that AOM Instance is not left empty.

    Figure 4 Viewing the AOM instance

How Do I Disable Monitoring Center?

To disable Monitoring Center, uninstall Cloud Native Cluster Monitoring on the Add-ons page of the CCE console or disable the interconnection with AOM.

Why Are Custom Metrics Not Displayed on Monitoring Center?

Monitoring Center currently does not display custom metrics. To view custom metrics, you can create a dashboard for custom metrics on the dashboard of AOM. For details, see Creating a Dashboard.

Why Is the Resource Information in the Node List Not Displayed for a Short Time (1 to 2 Minutes) After the prometheus-server Instance Is Restarted When Cloud Native Cluster Monitoring Is in Server Mode?

After the prometheus-server instance is restarted, the UID label values of instance metrics change. In server mode, data is stored locally. As a result, metrics overlap during the rolling restart of the prometheus-server instance. This means metrics of both old and new prometheus-server instances are reported by Cloud Native Cluster Monitoring. As a result, the resource information in the node list is inaccurate. When the metrics overlap, the resource information in the node list is not displayed. Unless otherwise specified, you are advised to use Cloud Native Cluster Monitoring in agent mode to interconnect with AOM.

Why Is Some Data Doubled After the kube-state-metrics Instance Is Restarted When Cloud Native Cluster Monitoring Is in Server Mode?

When the kube-state-metrics instance is scheduled to a new node, the instance label values of the metrics collected by the kube-state-metrics instance change. In server mode, data is stored locally. As a result, metrics overlap during the rolling restart of the kube-state-metrics instance. This means metrics of both old and new kube-state-metrics instances are reported by Cloud Native Cluster Monitoring to AOM. In addition, the instance label values are inconsistent, so all metrics are considered valid. As a result, the number of nodes, the number of workloads, the number of pods, the number of namespaces, and the number of control plane components displayed on the Monitoring Center > Clusters page are all doubled. Unless otherwise specified, you are advised to use Cloud Native Cluster Monitoring in agent mode to interconnect with AOM.

Why Cannot Metrics Be Reported When Cloud Native Cluster Monitoring Is in Server Mode?

The add-on instance in server mode has run out of storage space on the PV. As a result, metrics cannot be written.

Go to the Add-ons page, select the prometheus-server-x instance, and view its logs. If the log contains information similar to "no space left on device", the space of the disk mounted to this add-on instance is insufficient.

Figure 5 Viewing the add-on instance log

Solutions

  • Solution 1: Use the add-on instance in agent mode to interconnect with the AOM instance. If AOM is used to manage metrics, storage management is not required.
  • Solution 2: In the navigation pane on the left, choose Storage. On the displayed page, switch to the monitoring namespace, select the pvc-prometheus-server-0 disk, and click More > Scale-out in the Operation column. After the scale-out is complete, go to the StatefulSets tab and restart the prometheus-server-0 instance.
    Figure 6 Expanding the PVC capacity

    Insufficient disk space will prevent Prometheus metrics from being written. As a result, data cannot be collected. This means that any monitoring data generated during the scale-out and subsequent restart will be lost.