kube-prometheus-stack
Introduction
kube-prometheus-stack provides easy-to-use, end-to-end Kubernetes cluster monitoring capabilities by using Prometheus Operators and Prometheus. It also supports customized add-on specifications, interconnection with Grafana, high availability, and node affinity.
The core components of kube-prometheus-stack include prometheusOperator, prometheus, alertmanager, thanosSidecar, thanosQuery, adapter, kubeStateMetrics, nodeExporter, and grafana.
- prometheusOperator: deploys and manages the Prometheus Server based on Custom Resource Definition (CRDs), and monitors and processes the events related to these CRDs. It is the control center of the entire system.
- prometheus (Server): a Prometheus Server cluster deployed by the operator based on the Prometheus CRDs that can be regarded as StatefulSets.
- alertmanager: the alarm center of the add-on. It receives alarms sent by Prometheus and manages alarm information by deduplicating, grouping, and distributing.
- thanosSidecar: in HA scenarios, runs with Prometheus in the same pod to implement persistent storage of Prometheus metric data.
- thanosQuery: entry for PromQL query when Prometheus is in HA scenarios. It can delete duplicate data of the same metrics from Store or Prometheus.
- adapter (custom-metrics-apiserver): aggregates custom metrics to the native Kubernetes API Server.
- kube-state-metrics: converts the Prometheus metric data into a format that can be identified by Kubernetes APIs. By default, kube-state-metrics does not collect all labels and annotations of Kubernetes resources. If these labels and annotations need to be collected, see How Do I Modify the Collection Configuration of the kube-state-metrics Component?.
- nodeExporter: deployed on each node to collect node monitoring data.
- grafana: visualizes monitoring data. grafana creates a 5 GiB storage volume by default. Uninstalling the add-on will not delete this volume.
- clusterProblemDetector: monitors cluster exceptions.
Constraints
kube-prometheus-stack cannot be installed in UCS on-premises clusters.
Add-on Deployment Modes
The kube-prometheus-stack add-on can be deployed in Agent or Server mode.
- Deployed in Agent mode, the add-on occupies fewer cluster resources and provides the Prometheus metric collection capability for the cluster. However, the HPA and health diagnosis functions based on custom Prometheus statements are not supported.
- Deployed in Server mode, the add-on supports HPA and health diagnosis based on custom Prometheus statements. This mode depends on PVC and consumes a large amount of memory.
Precaution
kube-prometheus-stack is a system monitoring add-on. When cluster resources are insufficient, Kubernetes prioritizes resource scheduling to the pod where the add-on runs.
Permissions
nodeExporter monitors the disk space of Docker and reads the info data of Docker from the /var/run/dockersock directory of the host.
The following privilege is required by nodeExporter:
- cap_dac_override: reads the info data of Docker.
Upgrading the Add-on
- Select a fleet or a cluster that is not added to the fleet.
Figure 1 Selecting a fleet or a cluster not in the fleet
- Choose Container Insights > Clusters to view the clusters with monitoring enabled. Locate the cluster for which the add-on is to be upgraded and click View Details in the Operation column to access its overview page.
- The version of kube-prometheus-stack is displayed in the upper right corner. If the version is not the latest, upgrade the add-on to experience the latest functions.
Resource Quota Requirements of Different Specifications
Before installing the kube-prometheus-stack add-on, ensure that the cluster has sufficient schedulable resources such as CPUs and memory. For details about the resource quota requirements of default specifications in Agent mode, see Table 1. For details about the resource quota requirements of different add-on specifications in Server mode, see Table 2.
Add-on Specification |
Container |
CPU Quota |
Memory Quota |
||
---|---|---|---|---|---|
Default |
prometheusOperator |
Request: 100m |
Limit: 500m |
Request: 100 MiB |
Limit: 500 MiB |
prometheus |
Request: 500m |
Limit: 4 |
Request: 1 GiB |
Limit: 8 GiB |
|
kube-state-metrics |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 500 MiB |
|
nodeExporter |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 1 GiB |
|
grafana |
Request: 100m |
Limit: 500m |
Request: 200 MiB |
Limit: 2 GiB |
Add-on Specification |
Container |
CPU Quota |
Memory Quota |
||
---|---|---|---|---|---|
Demo (≤ 100 containers) |
prometheusOperator |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 500 MiB |
prometheus |
Request: 500m |
Limit: 2 |
Request: 2 GiB |
Limit: 8 GiB |
|
alertmanager |
Request: 200m |
Limit: 1 |
Request: 200 MiB |
Limit: 1 GiB |
|
thanosSidecar |
Request: 100m |
Limit: 1 |
Request: 100 MiB |
Limit: 2 GiB |
|
thanosQuery |
Request: 500m |
Limit: 2 |
Request: 500 MiB |
Limit: 4 GiB |
|
adapter |
Request: 400m |
Limit: 2 |
Request: 400 MiB |
Limit: 1 GiB |
|
kube-state-metrics |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 500 MiB |
|
nodeExporter |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 1 GiB |
|
grafana |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 2 GiB |
|
clusterProblemDetector |
Request: 100m |
Limit: 200m |
Request: 200 MiB |
Limit: 400 MiB |
|
Small (≤ 2,000 containers) |
prometheusOperator |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 500 MiB |
prometheus |
Request: 4 |
Limit: 8 |
Request: 16 GiB |
Limit: 32 GiB |
|
alertmanager |
Request: 500m |
Limit: 1 |
Request: 500 MiB |
Limit: 1 GiB |
|
thanosSidecar |
Request: 500m |
Limit: 1 |
Request: 500 MiB |
Limit: 2 GiB |
|
thanosQuery |
Request: 2 |
Limit: 4 |
Request: 2 GiB |
Limit: 16 GiB |
|
adapter |
Request: 2 |
Limit: 4 |
Request: 4 GiB |
Limit: 16 GiB |
|
kube-state-metrics |
Request: 500m |
Limit: 1 |
Request: 500 MiB |
Limit: 1 GiB |
|
nodeExporter |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 1 GiB |
|
grafana |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 2 GiB |
|
clusterProblemDetector |
Request: 200m |
Limit: 500m |
Request: 300 MiB |
Limit: 1 GiB |
|
Medium (≤ 5,000 containers) |
prometheusOperator |
Request: 500m |
Limit: 1 |
Request: 500 MiB |
Limit: 1 GiB |
prometheus |
Request: 8 |
Limit: 16 |
Request: 32 GiB |
Limit: 64 GiB |
|
alertmanager |
Request: 500m |
Limit: 1 |
Request: 500 MiB |
Limit: 2 GiB |
|
thanosSidecar |
Request: 1 |
Limit: 2 |
Request: 1 GiB |
Limit: 4 GiB |
|
thanosQuery |
Request: 2 |
Limit: 4 |
Request: 2 GiB |
Limit: 16 GiB |
|
adapter |
Request: 2 |
Limit: 4 |
Request: 16 GiB |
Limit: 32 GiB |
|
kube-state-metrics |
Request: 1 |
Limit: 2 |
Request: 1 GiB |
Limit: 2 GiB |
|
nodeExporter |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 1 GiB |
|
grafana |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 2 GiB |
|
clusterProblemDetector |
Request: 200m |
Limit: 1 |
Request: 400 MiB |
Limit: 2 GiB |
|
Large (> 5,000 containers) |
prometheusOperator |
Request: 500m |
Limit: 1 |
Request: 500 MiB |
Limit: 2 GiB |
prometheus |
Request: 8 |
Limit: 32 |
Request: 64 GiB |
Limit: 128 GiB |
|
alertmanager |
Request: 1 |
Limit: 2 |
Request: 1 GiB |
Limit: 4 GiB |
|
thanosSidecar |
Request: 2 |
Limit: 4 |
Request: 2 GiB |
Limit: 8 GiB |
|
thanosQuery |
Request: 2 |
Limit: 4 |
Request: 2 GiB |
Limit: 32 GiB |
|
adapter |
Request: 2 |
Limit: 4 |
Request: 32 GiB |
Limit: 64 GiB |
|
kube-state-metrics |
Request: 1 |
Limit: 3 |
Request: 1 GiB |
Limit: 3 GiB |
|
nodeExporter |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 1 GiB |
|
grafana |
Request: 200m |
Limit: 500m |
Request: 200 MiB |
Limit: 2 GiB |
|
clusterProblemDetector |
Request: 200m |
Limit: 1 |
Request: 400 MiB |
Limit: 2 GiB |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot