Updated on 2024-06-17 GMT+08:00

kube-prometheus-stack

Introduction

kube-prometheus-stack provides easy-to-use, end-to-end Kubernetes cluster monitoring capabilities by using Prometheus Operators and Prometheus. It also supports customized add-on specifications, interconnection with Grafana, high availability, and node affinity.

The core components of kube-prometheus-stack include prometheusOperator, prometheus, alertmanager, thanosSidecar, thanosQuery, adapter, kubeStateMetrics, nodeExporter, and grafana.

  • prometheusOperator: deploys and manages the Prometheus Server based on Custom Resource Definition (CRDs), and monitors and processes the events related to these CRDs. It is the control center of the entire system.
  • prometheus (Server): a Prometheus Server cluster deployed by the operator based on the Prometheus CRDs that can be regarded as StatefulSets.
  • alertmanager: the alarm center of the add-on. It receives alarms sent by Prometheus and manages alarm information by deduplicating, grouping, and distributing.
  • thanosSidecar: in HA scenarios, runs with Prometheus in the same pod to implement persistent storage of Prometheus metric data.
  • thanosQuery: entry for PromQL query when Prometheus is in HA scenarios. It can delete duplicate data of the same metrics from Store or Prometheus.
  • adapter (custom-metrics-apiserver): aggregates custom metrics to the native Kubernetes API Server.
  • kube-state-metrics: converts the Prometheus metric data into a format that can be identified by Kubernetes APIs. By default, kube-state-metrics does not collect all labels and annotations of Kubernetes resources. If these labels and annotations need to be collected, see How Do I Modify the Collection Configuration of the kube-state-metrics Component?.
  • nodeExporter: deployed on each node to collect node monitoring data.
  • grafana: visualizes monitoring data. grafana creates a 5 GiB storage volume by default. Uninstalling the add-on will not delete this volume.
  • clusterProblemDetector: monitors cluster exceptions.

Add-on Deployment Modes

The kube-prometheus-stack add-on can be deployed in Agent or Server mode.

  • Deployed in Agent mode, the add-on occupies fewer cluster resources and provides the Prometheus metric collection capability for the cluster. However, the HPA and health diagnosis functions based on custom Prometheus statements are not supported.
  • Deployed in Server mode, the add-on supports HPA and health diagnosis based on custom Prometheus statements. This mode depends on PVC and consumes a large amount of memory.

Precaution

kube-prometheus-stack is a system monitoring add-on. When cluster resources are insufficient, Kubernetes prioritizes resource scheduling to the pod where the add-on runs.

Permissions

nodeExporter monitors the disk space of Docker and reads the info data of Docker from the /var/run/dockersock directory of the host.

The following privilege is required by nodeExporter:

  • cap_dac_override: reads the info data of Docker.

Upgrading the Add-on

  1. Select a fleet or a cluster that is not added to the fleet.

    Figure 1 Selecting a fleet or a cluster not in the fleet

  2. Choose Container Insights > Clusters to view the clusters with monitoring enabled. Locate the cluster for which the add-on is to be upgraded and click View Details in the Operation column to access its overview page.
  3. The version of kube-prometheus-stack is displayed in the upper right corner. If the version is not the latest, upgrade the add-on to experience the latest functions.

Resource Quota Requirements of Different Specifications

Before installing the kube-prometheus-stack add-on, ensure that the cluster has sufficient schedulable resources such as CPUs and memory. For details about the resource quota requirements of default specifications in Agent mode, see Table 1. For details about the resource quota requirements of different add-on specifications in Server mode, see Table 2.

Table 1 Resource quota requirements of default specifications in Agent mode

Add-on Specification

Container

CPU Quota

Memory Quota

Default

prometheusOperator

Request: 100m

Limit: 500m

Request: 100 MiB

Limit: 500 MiB

prometheus

Request: 500m

Limit: 4

Request: 1 GiB

Limit: 8 GiB

kube-state-metrics

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 500 MiB

nodeExporter

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 1 GiB

grafana

Request: 100m

Limit: 500m

Request: 200 MiB

Limit: 2 GiB

Table 2 Resource quota requirements of different specifications in Server mode

Add-on Specification

Container

CPU Quota

Memory Quota

Demo (≤ 100 containers)

prometheusOperator

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 500 MiB

prometheus

Request: 500m

Limit: 2

Request: 2 GiB

Limit: 8 GiB

alertmanager

Request: 200m

Limit: 1

Request: 200 MiB

Limit: 1 GiB

thanosSidecar

Request: 100m

Limit: 1

Request: 100 MiB

Limit: 2 GiB

thanosQuery

Request: 500m

Limit: 2

Request: 500 MiB

Limit: 4 GiB

adapter

Request: 400m

Limit: 2

Request: 400 MiB

Limit: 1 GiB

kube-state-metrics

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 500 MiB

nodeExporter

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 1 GiB

grafana

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 2 GiB

clusterProblemDetector

Request: 100m

Limit: 200m

Request: 200 MiB

Limit: 400 MiB

Small (≤ 2,000 containers)

prometheusOperator

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 500 MiB

prometheus

Request: 4

Limit: 8

Request: 16 GiB

Limit: 32 GiB

alertmanager

Request: 500m

Limit: 1

Request: 500 MiB

Limit: 1 GiB

thanosSidecar

Request: 500m

Limit: 1

Request: 500 MiB

Limit: 2 GiB

thanosQuery

Request: 2

Limit: 4

Request: 2 GiB

Limit: 16 GiB

adapter

Request: 2

Limit: 4

Request: 4 GiB

Limit: 16 GiB

kube-state-metrics

Request: 500m

Limit: 1

Request: 500 MiB

Limit: 1 GiB

nodeExporter

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 1 GiB

grafana

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 2 GiB

clusterProblemDetector

Request: 200m

Limit: 500m

Request: 300 MiB

Limit: 1 GiB

Medium (≤ 5,000 containers)

prometheusOperator

Request: 500m

Limit: 1

Request: 500 MiB

Limit: 1 GiB

prometheus

Request: 8

Limit: 16

Request: 32 GiB

Limit: 64 GiB

alertmanager

Request: 500m

Limit: 1

Request: 500 MiB

Limit: 2 GiB

thanosSidecar

Request: 1

Limit: 2

Request: 1 GiB

Limit: 4 GiB

thanosQuery

Request: 2

Limit: 4

Request: 2 GiB

Limit: 16 GiB

adapter

Request: 2

Limit: 4

Request: 16 GiB

Limit: 32 GiB

kube-state-metrics

Request: 1

Limit: 2

Request: 1 GiB

Limit: 2 GiB

nodeExporter

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 1 GiB

grafana

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 2 GiB

clusterProblemDetector

Request: 200m

Limit: 1

Request: 400 MiB

Limit: 2 GiB

Large (> 5,000 containers)

prometheusOperator

Request: 500m

Limit: 1

Request: 500 MiB

Limit: 2 GiB

prometheus

Request: 8

Limit: 32

Request: 64 GiB

Limit: 128 GiB

alertmanager

Request: 1

Limit: 2

Request: 1 GiB

Limit: 4 GiB

thanosSidecar

Request: 2

Limit: 4

Request: 2 GiB

Limit: 8 GiB

thanosQuery

Request: 2

Limit: 4

Request: 2 GiB

Limit: 32 GiB

adapter

Request: 2

Limit: 4

Request: 32 GiB

Limit: 64 GiB

kube-state-metrics

Request: 1

Limit: 3

Request: 1 GiB

Limit: 3 GiB

nodeExporter

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 1 GiB

grafana

Request: 200m

Limit: 500m

Request: 200 MiB

Limit: 2 GiB

clusterProblemDetector

Request: 200m

Limit: 1

Request: 400 MiB

Limit: 2 GiB