Monitoring Master Node Components Using Prometheus_O&M Best Practices_O&M_User Guide

Collecting the Metrics of Master Node Components Using Self-Built Prometheus

This section describes how to collect the metrics of master node components using self-built Prometheus.

The cluster version must be 1.19 or later.
You need to install self-built Prometheus using Helm by referring to Prometheus. You need to use prometheus-operator to manage the installed Prometheus by referring to Prometheus Operator.
Because the Prometheus add-on (Prometheus) is end of maintenance and does not support this function, you are advised not to use this add-on.

Use kubectl to connect to the cluster.

Modify the ClusterRole of Prometheus.

kubectl edit ClusterRole prometheus -n {namespace}

Add the following content under the rules field:

rules:
...
- apiGroups:
  - proxy.exporter.k8s.io
  resources:
  - "*"
  verbs: ["get", "list", "watch"]

Create a file named kube-apiserver.yaml and edit it.

vi kube-apiserver.yaml

Example file content:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: apiserver
  name: kube-apiserver
  namespace: monitoring    # Change it to the namespace where Prometheus will be installed.
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    metricRelabelings:
    - action: keep
      regex: (aggregator_unavailable_apiservice|apiserver_admission_controller_admission_duration_seconds_bucket|apiserver_admission_webhook_admission_duration_seconds_bucket|apiserver_admission_webhook_admission_duration_seconds_count|apiserver_client_certificate_expiration_seconds_bucket|apiserver_client_certificate_expiration_seconds_count|apiserver_current_inflight_requests|apiserver_request_duration_seconds_bucket|apiserver_request_total|go_goroutines|kubernetes_build_info|process_cpu_seconds_total|process_resident_memory_bytes|rest_client_requests_total|workqueue_adds_total|workqueue_depth|workqueue_queue_duration_seconds_bucket|aggregator_unavailable_apiservice_total|rest_client_request_duration_seconds_bucket)
      sourceLabels:
      - __name__
    - action: drop
      regex: apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50)
      sourceLabels:
      - __name__
      - le
    port: https
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      serverName: kubernetes
  jobLabel: component
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      component: apiserver
      provider: kubernetes

Create a ServiceMonitor:

kubectl apply -f kube-apiserver.yaml

Create a file named kube-controller.yaml and edit it.

vi kube-controller.yaml

Example file content:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: kube-controller
  name: kube-controller-manager
  namespace: monitoring    # Change it to the namespace where Prometheus will be installed.
spec:
  endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      interval: 15s
      honorLabels: true
      port: https
      relabelings:
        - regex: (.+)
          replacement: /apis/proxy.exporter.k8s.io/v1beta1/kube-controller-proxy/${1}/metrics
          sourceLabels:
            - __address__
          targetLabel: __metrics_path__
        - regex: (.+)
          replacement: ${1}
          sourceLabels:
            - __address__
          targetLabel: instance
        - replacement: kubernetes.default.svc.cluster.local:443
          targetLabel: __address__
      scheme: https
      tlsConfig:
        caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  jobLabel: app
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app: kube-controller-proxy
      version: v1

Create a ServiceMonitor:

kubectl apply -f kube-controller.yaml

Create a file named kube-scheduler.yaml and edit it.

vi kube-scheduler.yaml

Example file content:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: kube-scheduler
  name: kube-scheduler
  namespace: monitoring    # Change it to the namespace where Prometheus will be installed.
spec:
  endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      interval: 15s
      honorLabels: true
      port: https
      relabelings:
        - regex: (.+)
          replacement: /apis/proxy.exporter.k8s.io/v1beta1/kube-scheduler-proxy/${1}/metrics
          sourceLabels:
            - __address__
          targetLabel: __metrics_path__
        - regex: (.+)
          replacement: ${1}
          sourceLabels:
            - __address__
          targetLabel: instance
        - replacement: kubernetes.default.svc.cluster.local:443
          targetLabel: __address__
      scheme: https
      tlsConfig:
        caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  jobLabel: app
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app: kube-scheduler-proxy
      version: v1

Create a ServiceMonitor:

kubectl apply -f kube-scheduler.yaml

Create a file named etcd-server.yaml and edit it.

vi etcd-server.yaml

Example file content:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: etcd-server
  name: etcd-server
  namespace: monitoring    # Change it to the namespace where Prometheus will be installed.
spec:
  endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      interval: 15s
      honorLabels: true
      port: https
      relabelings:
        - regex: (.+)
          replacement: /apis/proxy.exporter.k8s.io/v1beta1/etcd-server-proxy/${1}/metrics
          sourceLabels:
            - __address__
          targetLabel: __metrics_path__
        - regex: (.+)
          replacement: ${1}
          sourceLabels:
            - __address__
          targetLabel: instance
        - replacement: kubernetes.default.svc.cluster.local:443
          targetLabel: __address__
      scheme: https
      tlsConfig:
        caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  jobLabel: app
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app: etcd-server-proxy
      version: v1

Create a ServiceMonitor:

kubectl apply -f etcd-server.yaml

Access Prometheus and choose Status > Targets.

The preceding master node components are displayed.

kube-apiserver Metrics

kube-controller Metrics

Metric	Type	Description	Example PromQL Statement
workqueue_adds_total	Counter	The number of adds processed by the workqueue.	To calculate the task addition rate of each queue: rate(workqueue_adds_total[5m]) To detect the abnormal high addition rate (> 1,000/minute): rate(workqueue_adds_total[1m]) > 1000/60 To sort the results by controller: topk(3, sum by(name) (rate(workqueue_adds_total[5m])))
workqueue_depth	Gauge	How big the workqueue is. If the queue depth remains high for a long time, the controller cannot process tasks in the queue in a timely manner, causing task stacking.	To view the depths of all queues: workqueue_depth
workqueue_queue_duration_seconds_bucket	Histogram	How long in seconds a task stays in the workqueue before being executed. Example labels: le: core label of the histogram. It indicates the number of requests that took less than or equal to an interval (measured in seconds). For example, le="0.005" indicates the number of requests that took less than or equal to 5 ms. name: task name.	To calculate the queuing time for 99% of requests: histogram_quantile(0.99, rate(workqueue_queue_duration_seconds_bucket[5m])) To detect long-lasting tasks (> 10s): sum(rate(workqueue_queue_duration_seconds_bucket{le="+Inf"}[5m])) - sum(rate(workqueue_queue_duration_seconds_bucket{le="10"}[5m]))
rest_client_requests_total	Counter	The number of HTTP requests initiated by the Kubernetes client. Example labels: code: HTTP response status code, such as 200 (OK), 404 (Not Found), 500 (Internal Server Error), or 429 (Too Many Requests). host: host address. method: HTTP request method, such as GET, POST, PUT, DELETE, PATCH, LIST, or WATCH.	To calculate the request rate (by status code): sum by(code) (rate(rest_client_requests_total[5m])) To detect the 5xx error rate: rate(rest_client_requests_total{code=~"5.."}[5m]) / rate(rest_client_requests_total[5m]) To collect request statistics by target service: sum by(host) (rate(rest_client_requests_total[5m]))
rest_client_request_duration_seconds_bucket	Histogram	The latency of HTTP requests from the client. Example labels: le: core label of the histogram. It indicates the number of requests that took less than or equal to an interval (measured in seconds). For example, le="0.005" indicates the number of requests that took less than or equal to 5 ms. host: host address. verb: HTTP request method, such as GET, POST, PUT, DELETE, PATCH, LIST, or WATCH.	To calculate the P95 latency: histogram_quantile(0.95, rate(rest_client_request_duration_seconds_bucket[5m])) To detect slow requests (> 2s): sum(rate(rest_client_request_duration_seconds_bucket{le="+Inf"}[5m]))- sum(rate(rest_client_request_duration_seconds_bucket{le="2"}[5m]))

kube-scheduler Metrics

Metric	Type	Description	Example PromQL Statement
scheduler_scheduler_cache_size	Gauge	The number of nodes, pods, and assumed pods (pods to be scheduled) in the scheduler cache.	To view the number of cached pods: scheduler_scheduler_cache_size{type="pod"}
scheduler_pending_pods	Gauge	The number of pending pods. This metric can be used to identify scheduling bottlenecks. This metric is usually classified by the following queue labels: active: the number of pods that are ready and waiting for scheduling. backoff: the number of pods that fail to be scheduled. gated: the number of pods that have been scheduled but declared unschedulable, or explicitly marked as unschedulable. unschedulable: the number of pods that cannot be scheduled.	To view the number of pods in a specific queue: scheduler_pending_pods{queue="backoff"}
scheduler_pod_scheduling_attempts_bucket	Histogram	The number of attempts to schedule a pod. Generally, this metric is labeled by le. The value can be 1, 2, 4, 8, 16, or +Inf.	To detect high-frequency retries (more than eight attempts): sum(rate(scheduler_pod_scheduling_attempts_bucket{le="+Inf"}[5m]))- sum(rate(scheduler_pod_scheduling_attempts_bucket{le="8"}[5m]))

Metric

Type

Description

Example PromQL Statement

scheduler_scheduler_cache_size

Gauge

The number of nodes, pods, and assumed pods (pods to be scheduled) in the scheduler cache.

To view the number of cached pods:

scheduler_scheduler_cache_size{type="pod"}

scheduler_pending_pods

Gauge

The number of pending pods. This metric can be used to identify scheduling bottlenecks.

This metric is usually classified by the following queue labels:

active: the number of pods that are ready and waiting for scheduling.
backoff: the number of pods that fail to be scheduled.
gated: the number of pods that have been scheduled but declared unschedulable, or explicitly marked as unschedulable.
unschedulable: the number of pods that cannot be scheduled.

To view the number of pods in a specific queue:

scheduler_pending_pods{queue="backoff"}

scheduler_pod_scheduling_attempts_bucket

Histogram

The number of attempts to schedule a pod.

Generally, this metric is labeled by le. The value can be 1, 2, 4, 8, 16, or +Inf.

To detect high-frequency retries (more than eight attempts):

sum(rate(scheduler_pod_scheduling_attempts_bucket{le="+Inf"}[5m]))- sum(rate(scheduler_pod_scheduling_attempts_bucket{le="8"}[5m]))

etcd-server Metrics

Category	Metric	Type	Description	Example PromQL Statement
etcd leader status metrics	etcd_server_has_leader	Gauge	etcd elects a member in a cluster as the leader (master node) and other members as followers (slave nodes). The leader periodically sends heartbeats to all members for cluster stability. This metric controls whether there is a leader among the etcd servers. Options: 1: There is a leader among the etcd servers. 0: There is no leader among the etcd servers.	To view the leader status: etcd_server_has_leader
	etcd_server_is_leader	Gauge	Whether an etcd member is the leader. Options: 1: The etcd member is the leader. 0: The etcd member is not the leader.	To check whether an etcd member is the leader: etcd_server_is_leader
	etcd_server_leader_changes_seen_total	Counter	The number of leader changes within a specific period of time.	To monitor the leader change frequency within 1 hour: rate(etcd_server_leader_changes_seen_total[1h])
etcd storage metrics	etcd_mvcc_db_total_size_in_bytes	Gauge	The total size of the etcd.	To calculate the storage space usage: etcd_mvcc_db_total_size_in_use_in_bytes / etcd_mvcc_db_total_size_in_bytes
	etcd_mvcc_db_total_size_in_use_in_bytes	Gauge	The usage of the etcd.
	etcd_debugging_mvcc_keys_total	Gauge	The total number of keys in the etcd.	To monitor the increase of keys: rate(etcd_debugging_mvcc_keys_total[5m])
etcd write performance metrics	etcd_disk_backend_commit_duration_seconds_bucket	Histogram	Time required by the etcd for data at rest. This is the time that the etcd takes to write a data change to the storage backend and commit data.	To calculate the P99 latency for writes: histogram_quantile(0.99,rate(etcd_disk_backend_commit_duration_seconds_bucket[5m]))
	etcd_server_proposals_committed_total	Gauge	The number of proposals submitted by the etcd.	To calculate the write failure rate: rate(etcd_server_proposals_failed_total[5m]) / rate(etcd_server_proposals_committed_total[5m])
	etcd_server_proposals_applied_total	Gauge	The number of applied or executed proposals.	To calculate the write success rate: rate(etcd_server_proposals_applied_total[5m]) / rate(etcd_server_proposals_committed_total[5m])
	etcd_server_proposals_pending	Gauge	The number of pending proposals.	To detect the stacked writes: etcd_server_proposals_pending
	etcd_server_proposals_failed_total	Counter	The number of failed proposals.	To calculate the write failure rate: rate(etcd_server_proposals_failed_total[5m]) / rate(etcd_server_proposals_committed_total[5m])

Helpful Link

For more information about Kubernetes system component metrics, see Kubernetes Metrics Reference.

Monitoring Master Node Components Using Prometheus

Collecting the Metrics of Master Node Components Using Self-Built Prometheus

kube-apiserver Metrics

kube-controller Metrics

kube-scheduler Metrics

etcd-server Metrics

Helpful Link

Feedback

Was this page helpful?

Category	Metric	Type	Description	Example PromQL Statement
Request metrics	apiserver_request_total	Counter	The total number of API requests received by kube-apiserver, broken down by labels. Example labels: verb: HTTP request method, such as GET, POST, PUT, DELETE, PATCH, LIST, or WATCH. group: Kubernetes API group, such as apps/v1 or networking.k8s.io/v1. version: API version, such as v1 or v1beta1. resource: Kubernetes resource type, such as pods, Deployments, Services, or nodes. subresource: sub-resource of a resource (some operations are only available for sub-attributes of the resource), such as logs (viewing logs), exec (executing commands), or status (updating status). scope: application scope of a request, such as cluster (cluster-level), namespace (namespace-level), or resource (single resource). component: source component of a request, such as kube-controller-manager or kube-scheduler. client: the client that initiates a request. It may be an internal component or an external service. code: HTTP response status code, such as 200 (OK), 404 (Not Found), 500 (Internal Server Error), or 429 (Too Many Requests).	To query the total request rate (measured by QPS): sum(rate(apiserver_request_total[5m])) To query the request errors (5xx returned): sum(rate(apiserver_request_total{code=~"5.."}[5m])) by (resource, verb) To query the requests subject to rate limiting (status code 429) sum(rate(apiserver_request_total{code="429"}[5m])) by (resource) To query the clients that frequently send requests: topk(5, sum(rate(apiserver_request_total[5m])) by (client))
	apiserver_request_duration_seconds_bucket	Histogram	Response latency distribution, broken down by labels (such as request type, resource, and status code). This metric can be used to analyze P50, P90, and P99 latencies, identify slow requests and high-latency resources, and monitor kube-apiserver performance. Example labels: le: core label of the histogram. It indicates the number of requests that took less than or equal to an interval (measured in seconds). For example, le="0.005" indicates the number of requests that took less than or equal to 5 ms. verb: HTTP request method, such as GET, POST, PUT, DELETE, PATCH, LIST, or WATCH. group: Kubernetes API group, such as apps/v1 or networking.k8s.io/v1. version: API version, such as v1 or v1beta1. resource: Kubernetes resource type, such as pods, Deployments, Services, or nodes. subresource: sub-resource of a resource (some operations are only available for sub-attributes of the resource), such as logs (viewing logs), exec (executing commands), or status (updating status). scope: application scope of a request, such as cluster (cluster-level), namespace (namespace-level), or resource (single resource). component: source component of a request, such as kube-controller-manager or kube-scheduler. client: the client that initiates a request. It may be an internal component or an external service.	To query the P99 latency (99% of the requests completed within given latency): histogram_quantile(0.99,sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le, resource, verb)) Example output: {resource="pods", verb="GET"} 0.8 # 99% of GET requests for pods were completed within 0.8s or less. {resource="deployments", verb="POST"} 1.2 # 99% of POST requests for Deployments were completed within 1.2s or less. To query the P90 latency by resource type: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le, resource)) Example output: {resource="pods"} 0.5 # 90% of pod requests were completed within 0.5s or less. {resource="services"} 0.3 # 90% of Service requests were completed within 0.3s or less. To monitor high-latency requests (> 1s): sum(rate(apiserver_request_duration_seconds_bucket{le="+Inf"}[5m])) by (resource, verb)- sum(rate(apiserver_request_duration_seconds_bucket{le="1.0"}[5m])) by (resource, verb) Example output: {resource="pods", verb="LIST"} 50 # There were 50 LIST requests for pods that took more than 1s in 5 minutes.
	apiserver_current_inflight_requests	Gauge	The number of API requests that are being executed. This metric typically contains the following labels: readOnly: read requests, which do not change the cluster status. Read requests are usually used to read resources, for example, obtaining the pod list and querying the node status. mutating: write requests, which change the cluster status. Write requests are usually used to create, update, or delete resources, for example, creating a pod or updating a Service.	To view the number of write requests that are being executed: apiserver_current_inflight_requests{request_kind="mutating"} To view the number of read requests that are being executed: apiserver_current_inflight_requests{request_kind="readOnly"} To view the total number of requests that are being executed: sum(apiserver_current_inflight_requests)
	etcd_request_duration_seconds_bucket	Histogram	etcd request latency. Example labels: le: core label of the histogram. It indicates the number of requests that took less than or equal to an interval (measured in seconds). For example, le="0.005" indicates the number of requests that took less than or equal to 5 ms. type: type of the operation object. operation: operation type.	To query P99 latency: histogram_quantile(0.99,sum(rate(etcd_request_duration_seconds_bucket[5m])) by (le, type)) To compare read and write latencies: histogram_quantile(0.95,rate(etcd_request_duration_seconds_bucket{type=~"range\|put"}[5m])) To detect slow requests (> 1s): sum(rate(etcd_request_duration_seconds_bucket{le="+Inf"}[5m])) - sum(rate(etcd_request_duration_seconds_bucket{le="1"}[5m]))
Error and rate limiting metrics	apiserver_flowcontrol_current_executing_requests	Gauge	One of the core metrics in API Priority and Fairness (APF). It reflects the number of requests being executed by the API server in real time. For details, see API Priority and Fairness. Example labels: priority_level: request priority level. Options: exempt: used for requests that are not subject to flow control (such as requests for key system operations). Requests of this priority level do not occupy the concurrency quota. system: used for requests from Kubernetes control plane components (such as kube-controller-manager and kube-scheduler) node-high: used for health status updates from nodes. leader-election: used for leader election requests (such as requests from kube-controller-manager). workload-high/low: used for workload requests with high and low priorities, respectively. global-default: used for default requests that do not match any FlowSchemas. catch-all: (default) used for all requests that are not explicitly classified. Requests of this priority have a very low concurrency quota. flow_schema: the FlowSchema that matches requests. A FlowSchema can identify the request source (such as kube-controller-manager or kube-scheduler).	To view the number of requests executed by priority level: apiserver_flowcontrol_current_executing_requests To query resource usages by priority level: sum(apiserver_flowcontrol_current_executing_requests{priority_level=~"system\|leader-election"}) / sum(apiserver_flowcontrol_nominal_limit_seats{priority_level=~"system\|leader-election"})
	apiserver_flowcontrol_current_inqueue_requests	Gauge	The number of requests waiting to be executed in the flow control queue. These requests have been received but are not executed because the number of concurrent requests has reached the configured limit. For details, see API Priority and Fairness. Example labels: priority_level: request priority level (such as system or workload-high). flow_schema: the FlowSchema that matches requests. A FlowSchema can identify the request source (such as kube-controller-manager or kube-scheduler).	To identify stacked requests labeled by system: apiserver_flowcontrol_current_inqueue_requests{priority_level="system"} To identify requests that are stacked within 5 minutes: delta(apiserver_flowcontrol_current_inqueue_requests[5m])
	apiserver_flowcontrol_nominal_limit_seats	Gauge	The nominal request concurrency limit per priority level. For details, see API Priority and Fairness. This metric is classified by the priority_level label, which indicates the request priority level (such as system or workload-high).	To view the nominal request concurrency limits for all priority levels: apiserver_flowcontrol_nominal_limit_seats To calculate resource usages based on executing requests: sum(apiserver_flowcontrol_current_executing_requests) by (priority_level) / apiserver_flowcontrol_nominal_limit_seats
	apiserver_flowcontrol_current_limit_seats	Gauge	The request concurrency limit per priority level. This metric allows you to learn the load of the API server and determine whether to adjust the traffic control policy in high-load scenarios. For details, see API Priority and Fairness. Unlike nominal_limit_seats, the value of this metric may be affected by the global traffic control policy. This metric is classified by the priority_level label, which indicates the request priority level (such as system or workload-high).	To view the request concurrency limit at a priority level: apiserver_flowcontrol_current_limit_seats{priority_level="system"}
	apiserver_flowcontrol_current_executing_seats	Gauge	The number of seats corresponding to the requests currently being executed in a priority queue. This metric reflects the concurrent resources being consumed in the queue and helps you understand the actual load of the queue. For details, see API Priority and Fairness. This metric is classified by the priority_level label, which indicates the request priority level (such as system or workload-high). If the value of current_executing_seats is close to that of current_limit_seats, the concurrent resources of the queue may be about to be used up. You can increase the values of max-mutating-requests-inflight and max-requests-inflight to optimize the configuration. For details, see Modifying Cluster Configurations.	To view the number of concurrent seats that have been occupied for a specific priority level: apiserver_flowcontrol_current_executing_seats{priority_level="system"} To calculate the seat usage (current usage/current limit): sum(apiserver_flowcontrol_current_executing_seats) by (priority_level) / sum(apiserver_flowcontrol_current_limit_seats) by (priority_level)
	apiserver_flowcontrol_current_inqueue_seats	Gauge	The concurrent resources consumed by the requests waiting in queues for all priority levels. For details, see API Priority and Fairness. This metric is classified by the priority_level label, which indicates the request priority level (such as system or workload-high).	To view the seats occupied by requests waiting in the queue for a specific priority level: apiserver_flowcontrol_current_inqueue_seats{priority_level="system"} To calculate the percentage of queued requests (the number of queued seats/total concurrency quota): sum(apiserver_flowcontrol_current_inqueue_seats) by (priority_level)/sum(apiserver_flowcontrol_nominal_limit_seats) by (priority_level)
	apiserver_flowcontrol_request_execution_seconds_bucket	Histogram	The execution time of API requests. For details, see API Priority and Fairness. Example labels: le: core label of the histogram. It indicates the number of requests that took less than or equal to an interval (measured in seconds). priority_level: request priority level (such as system or workload-high). flow_schema: the FlowSchema that matches requests. A FlowSchema can identify the request source (such as kube-controller-manager or kube-scheduler).	To calculate the execution time for 99% of requests: histogram_quantile(0.99, sum(rate(apiserver_flowcontrol_request_execution_seconds_bucket[5m])) by (le, priority_level)) To detect slow requests (> 1s): sum(rate(apiserver_flowcontrol_request_execution_seconds_bucket{le="+Inf"}[5m]))- sum(rate(apiserver_flowcontrol_request_execution_seconds_bucket{le="1"}[5m]))
	apiserver_flowcontrol_request_wait_duration_seconds_bucket	Histogram	The waiting time of API requests in a queue. For details, see API Priority and Fairness. Example labels: le: core label of the histogram. It indicates the number of requests that took less than or equal to an interval (measured in seconds). For example, le="0.005" indicates the number of requests that took less than or equal to 5 ms. priority_level: request priority level (such as system or workload-high). flow_schema: the FlowSchema that matches requests. A FlowSchema can identify the request source (such as kube-controller-manager or kube-scheduler).	To calculate the waiting time for 95% of requests: histogram_quantile(0.95, sum(rate(apiserver_flowcontrol_request_wait_duration_seconds_bucket[5m])) by (le, priority_level)) To detect long-lasting requests (> 5s): sum(rate(apiserver_flowcontrol_request_wait_duration_seconds_bucket{le="+Inf"}[5m]))- sum(rate(apiserver_flowcontrol_request_wait_duration_seconds_bucket{le="5"}[5m]))
	apiserver_flowcontrol_dispatched_requests_total	Counter	The total number of API requests that have been scheduled (started to be executed). For details, see API Priority and Fairness.	To calculate the request rate at each priority level: sum(rate(apiserver_flowcontrol_dispatched_requests_total[5m])) by (priority_level) To compare the number of requests in different FlowSchemas: sum(rate(apiserver_flowcontrol_dispatched_requests_total[5m])) by (flow_schema)
	apiserver_flowcontrol_rejected_requests_total	Counter	The total number of rejected API requests. Requests are often rejected due to traffic control or insufficient resources. For details, see API Priority and Fairness. Example labels: priority_level: request priority level. flow_schema: the FlowSchema that matches requests. A FlowSchema can identify the request source (such as kube-controller-manager or kube-scheduler). reason: the reason why a request is rejected. Options: queue-full: Too many requests were already queued. concurrency-limit: If the number of requests exceeds the concurrency limit, the extra requests will be rejected. The excess traffic is immediately rejected with HTTP 429 (Too Many Requests). time-out: The request was still in the queue when its queuing time expired. cancelled: The request was not purged and locked and has been ejected from the queue.	To calculate the request rejection rate: sum(rate(apiserver_flowcontrol_rejected_requests_total[5m])) by (priority_level, reason) To calculate the rejection rate ratio: sum(rate(apiserver_flowcontrol_rejected_requests_total[5m])) by (priority_level)/sum(rate(apiserver_flowcontrol_dispatched_requests_total[5m])) by (priority_level)
	apiserver_flowcontrol_request_concurrency_limit	Gauge	The maximum number of concurrent requests for a priority queue. This metric is deprecated in Kubernetes 1.30 and removed from Kubernetes 1.31. You are advised to use apiserver_flowcontrol_nominal_limit_seats in clusters of v1.31 or later.	To view the current global concurrency limit: apiserver_flowcontrol_request_concurrency_limit
Authentication and authorization metrics	apiserver_admission_controller_admission_duration_seconds_bucket	Histogram	The time that the admission controller takes to process API requests. Example labels: le: core label of the histogram. It indicates the number of requests that took less than or equal to an interval (measured in seconds). For example, le="0.005" indicates the number of requests that took less than or equal to 5 ms. name: name of the admission controller that processes requests, such as MutatingAdmissionWebhook or ValidatingAdmissionWebhook. operation: operation, such as CREATE, UPDATE, or DELETE. type: operation type. validate: validates the validity of a request. admit: controls whether to allow a valid request. rejected: controls whether a request is rejected. The value can be true or false.	To sort the results by controller name: sort_desc(histogram_quantile(0.99,rate(apiserver_admission_controller_admission_duration_seconds_bucket[5m]))) To calculate the processing time for 99% of requests: histogram_quantile(0.99,sum(rate(apiserver_admission_controller_admission_duration_seconds_bucket[5m])) by (le, name))
Authentication and authorization metrics	apiserver_admission_webhook_admission_duration_seconds_bucket	Histogram	The time that the admission webhook takes to process requests.	To calculate the processing time for 99% of requests: histogram_quantile(0.99,sum(rate(apiserver_admission_webhook_admission_duration_seconds_bucket[5m])) by (le, name))
Service availability metrics	up	Gauge	Service availability. Options: 1: A service is available. 0: A service is unavailable.	To check the availability of the current service: up