kube-prometheus-stack

Introduction

kube-prometheus-stack provides easy-to-use, end-to-end Kubernetes cluster monitoring capabilities by using Prometheus Operators and Prometheus. It also supports customized add-on specifications, interconnection with Grafana, high availability, and node affinity.

The core components of kube-prometheus-stack include prometheusOperator, prometheus, alertmanager, thanosSidecar, thanosQuery, adapter, kubeStateMetrics, nodeExporter, and grafana.

prometheusOperator: deploys and manages the Prometheus Server based on Custom Resource Definition (CRDs), and monitors and processes the events related to these CRDs. It is the control center of the entire system.
prometheus (Server): a Prometheus Server cluster deployed by the operator based on the Prometheus CRDs that can be regarded as StatefulSets.
alertmanager: the alarm center of the add-on. It receives alarms sent by Prometheus and manages alarm information by deduplicating, grouping, and distributing.
thanosSidecar: in HA scenarios, runs with Prometheus in the same pod to implement persistent storage of Prometheus metric data.
thanosQuery: entry for PromQL query when Prometheus is in HA scenarios. It can delete duplicate data of the same metrics from Store or Prometheus.
adapter (custom-metrics-apiserver): aggregates custom metrics to the native Kubernetes API Server.
kube-state-metrics: converts the Prometheus metric data into a format that can be identified by Kubernetes APIs. By default, kube-state-metrics does not collect all labels and annotations of Kubernetes resources. If these labels and annotations need to be collected, see How Do I Modify the Collection Configuration of the kube-state-metrics Component?.
nodeExporter: deployed on each node to collect node monitoring data.
grafana: visualizes monitoring data. grafana creates a 5 GiB storage volume by default. Uninstalling the add-on will not delete this volume.
clusterProblemDetector: monitors cluster exceptions.

Constraints

kube-prometheus-stack cannot be installed in UCS on-premises clusters.

Add-on Deployment Modes

The kube-prometheus-stack add-on can be deployed in Agent or Server mode.

Deployed in Agent mode, the add-on occupies fewer cluster resources and provides the Prometheus metric collection capability for the cluster. However, the HPA and health diagnosis functions based on custom Prometheus statements are not supported.
Deployed in Server mode, the add-on supports HPA and health diagnosis based on custom Prometheus statements. This mode depends on PVC and consumes a large amount of memory.

Precaution

kube-prometheus-stack is a system monitoring add-on. When cluster resources are insufficient, Kubernetes prioritizes resource scheduling to the pod where the add-on runs.

Permissions

nodeExporter monitors the disk space of Docker and reads the info data of Docker from the /var/run/dockersock directory of the host.

The following privilege is required by nodeExporter:

cap_dac_override: reads the info data of Docker.

Upgrading the Add-on

Select a fleet or a cluster that is not added to the fleet.

Figure 1 Selecting a fleet or a cluster not in the fleet
Choose Container Insights > Clusters to view the clusters with monitoring enabled. Locate the cluster for which the add-on is to be upgraded and click View Details in the Operation column to access its overview page.
The version of kube-prometheus-stack is displayed in the upper right corner. If the version is not the latest, upgrade the add-on to experience the latest functions.

Resource Quota Requirements of Different Specifications

Before installing the kube-prometheus-stack add-on, ensure that the cluster has sufficient schedulable resources such as CPUs and memory. For details about the resource quota requirements of default specifications in Agent mode, see Table 1. For details about the resource quota requirements of different add-on specifications in Server mode, see Table 2.

**Table 1** Resource quota requirements of default specifications in Agent mode
Add-on Specification	Container	CPU Quota		Memory Quota
Default	prometheusOperator	Request: 100m	Limit: 500m	Request: 100 MiB	Limit: 500 MiB
	prometheus	Request: 500m	Limit: 4	Request: 1 GiB	Limit: 8 GiB
	kube-state-metrics	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 500 MiB
	nodeExporter	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 1 GiB
	grafana	Request: 100m	Limit: 500m	Request: 200 MiB	Limit: 2 GiB

**Table 2** Resource quota requirements of different specifications in Server mode
Add-on Specification	Container	CPU Quota		Memory Quota
Demo (≤ 100 containers)	prometheusOperator	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 500 MiB
	prometheus	Request: 500m	Limit: 2	Request: 2 GiB	Limit: 8 GiB
	alertmanager	Request: 200m	Limit: 1	Request: 200 MiB	Limit: 1 GiB
	thanosSidecar	Request: 100m	Limit: 1	Request: 100 MiB	Limit: 2 GiB
	thanosQuery	Request: 500m	Limit: 2	Request: 500 MiB	Limit: 4 GiB
	adapter	Request: 400m	Limit: 2	Request: 400 MiB	Limit: 1 GiB
	kube-state-metrics	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 500 MiB
	nodeExporter	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 1 GiB
	grafana	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 2 GiB
	clusterProblemDetector	Request: 100m	Limit: 200m	Request: 200 MiB	Limit: 400 MiB
Small (≤ 2,000 containers)	prometheusOperator	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 500 MiB
	prometheus	Request: 4	Limit: 8	Request: 16 GiB	Limit: 32 GiB
	alertmanager	Request: 500m	Limit: 1	Request: 500 MiB	Limit: 1 GiB
	thanosSidecar	Request: 500m	Limit: 1	Request: 500 MiB	Limit: 2 GiB
	thanosQuery	Request: 2	Limit: 4	Request: 2 GiB	Limit: 16 GiB
	adapter	Request: 2	Limit: 4	Request: 4 GiB	Limit: 16 GiB
	kube-state-metrics	Request: 500m	Limit: 1	Request: 500 MiB	Limit: 1 GiB
	nodeExporter	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 1 GiB
	grafana	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 2 GiB
	clusterProblemDetector	Request: 200m	Limit: 500m	Request: 300 MiB	Limit: 1 GiB
Medium (≤ 5,000 containers)	prometheusOperator	Request: 500m	Limit: 1	Request: 500 MiB	Limit: 1 GiB
	prometheus	Request: 8	Limit: 16	Request: 32 GiB	Limit: 64 GiB
	alertmanager	Request: 500m	Limit: 1	Request: 500 MiB	Limit: 2 GiB
	thanosSidecar	Request: 1	Limit: 2	Request: 1 GiB	Limit: 4 GiB
	thanosQuery	Request: 2	Limit: 4	Request: 2 GiB	Limit: 16 GiB
	adapter	Request: 2	Limit: 4	Request: 16 GiB	Limit: 32 GiB
	kube-state-metrics	Request: 1	Limit: 2	Request: 1 GiB	Limit: 2 GiB
	nodeExporter	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 1 GiB
	grafana	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 2 GiB
	clusterProblemDetector	Request: 200m	Limit: 1	Request: 400 MiB	Limit: 2 GiB
Large (> 5,000 containers)	prometheusOperator	Request: 500m	Limit: 1	Request: 500 MiB	Limit: 2 GiB
	prometheus	Request: 8	Limit: 32	Request: 64 GiB	Limit: 128 GiB
	alertmanager	Request: 1	Limit: 2	Request: 1 GiB	Limit: 4 GiB
	thanosSidecar	Request: 2	Limit: 4	Request: 2 GiB	Limit: 8 GiB
	thanosQuery	Request: 2	Limit: 4	Request: 2 GiB	Limit: 32 GiB
	adapter	Request: 2	Limit: 4	Request: 32 GiB	Limit: 64 GiB
	kube-state-metrics	Request: 1	Limit: 3	Request: 1 GiB	Limit: 3 GiB
	nodeExporter	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 1 GiB
	grafana	Request: 200m	Limit: 500m	Request: 200 MiB	Limit: 2 GiB
	clusterProblemDetector	Request: 200m	Limit: 1	Request: 400 MiB	Limit: 2 GiB