Creating a VPA Policy

Kubernetes Vertical Pod Autoscaler (VPA) scales pods vertically. It does this by analyzing the historical usage of container resources and automatically adjusting the CPU and memory resources requested by pods. VPA can adjust container resource requests within a specific range to match the service load. It increases the resource requests when the service load increases and reduces them when the service load decreases to conserve computing resources. Additionally, VPA can provide recommendations on CPU and memory requests to optimize container resource utilization while ensuring that the containers have enough resources to function properly.

Overview

VPA collects and analyzes resource metrics for each container, adjusts the requested resources based on actual usage, and maintains the ratio of resource limit to request before and after the adjustment. VPA can increase or decrease CPU and memory resources as needed.

The rules are as follows:

VPA generates the CPU and memory resource recommendations using the data collected by the Metrics API.
VPA, in theory, recommends a minimum of 250 MiB of memory for each pod and 250 MiB divided by the number of containers in the pod for each container. It also recommends a minimum of 25m vCPUs for each pod and 25m divided by the number of containers in the pod for each container.
When setting up a VPA, you can establish the minimum and maximum number of elastic resources in containers by configuring the containerPolicies field.
If a container has both resource request and limit configured, VPA will provide resource recommendations. It will adjust the requested resources of the container to match the recommendations and generate recommended resource limit based on the ratio of the original resource request to the limit set during the container's initial creation.
Assume that the requested vCPUs of a container are 100m and the limit is 200m (with a ratio of 1:2). If VPA recommends a requested vCPU of 80m, the container's vCPU limit will be 160m.
VPA ensures its recommendations align with other resource limits. If the VPA recommendations conflict with a resource limit, they will not be adjusted to fit the limit. This means that the resource configuration suggested by VPA may go beyond other resource limits.
Assume that the requested memory of a namespace cannot exceed 2 GiB. If VPA recommends a high memory configuration for a pod in that namespace, the total memory requested by the namespace may exceed 2 GiB after the pod's resource configuration is updated. This means the pod will not be scheduled.

Prerequisites

The cluster version must be 1.25 or later.
An add-on that provides Metrics API has been installed in the cluster. You can select one of the following add-ons based on your service requirements:
- Kubernetes Metrics Server: provides basic resource usage metrics, such as container CPU and memory usage.
- Cloud Native Cluster Monitoring: provides basic resource usage metrics using Prometheus. You need to register Prometheus as a service of Metrics API. For details, see Providing Resource Metrics Through the Metrics API.

The Vertical Pod Autoscaler (Vertical Pod Autoscaler) add-on has been installed in the cluster.

Precautions

The VPA feature is being tested. Exercise caution when using this feature.

When VPA adjusts a pod's resource configuration dynamically, the pod will be recreated and could be scheduled to a different node. However, VPA cannot ensure that the scheduling will be successful.
VPA can dynamically adjust the resource configuration of pods managed by replication controllers (such as Deployments and StatefulSets), but not for pods that are not managed by replication controllers.
VPA and HPA that monitors CPU and memory metrics cannot operate simultaneously.
VPA admission webhooks update the pod resource configuration. If there are other admission webhooks present in the cluster, make sure they do not conflict with the VPA admission webhooks.
VPA handles the majority of OOM events, but it may not handle all of them.
VPA performance has not yet been tested in large clusters.
VPA may recommend more resources than what is currently available, such as exceeding the maximum limit or resource quotas of nodes. This can result in recreated pods being stuck in a pending state and unable to be scheduled.
Configuring multiple VPAs for the same workload can lead to inconsistent behavior.

Procedure

Use kubectl to access the cluster. For details, see Connecting to a Cluster Using kubectl.

Deploy a sample workload. If there is a workload in the cluster, skip this step.

kubectl create -f hamster.yaml

Example configuration of hamster.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hamster
spec:
  selector:
    matchLabels:
      app: hamster
  replicas: 2
  template:
    metadata:
      labels:
        app: hamster
    spec:
      containers:
        - name: hamster
          image: registry.k8s.io/ubuntu-slim:0.1
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
          command: ["/bin/sh"]
          args:
            - "-c"
            - "while true; do timeout 0.5s yes >/dev/null; sleep 0.5s; done"

Create a VPA.

kubectl create -f hamster-vpa.yaml

Example configuration of hamster-vpa.yaml:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: hamster-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: hamster
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 100m
          memory: 50Mi
        maxAllowed:
          cpu: 1
          memory: 500Mi
        controlledResources: ["cpu", "memory"]

**Table 1** Key fields of a VPA
Field	Mandatory	Description
spec.targetRef	Yes	The target workloads of the VPA Workload types such as Deployment, StatefulSet, and DaemonSet are supported.
spec.updatePolicy.updateMode	No	The update policy of the VPA recommended resources. The default value is Auto. Options: Off: The VPA only generates recommended resources and does not change the requested resources of pods. Recreate: The VPA generates recommended resources and automatically changes the requested resources of pods. Initial: The VPA generates recommended resources, changes the requested resources upon pod creation, but leaves the requested resources of existing pods unchanged. Auto: The VPA behaves similarly to setting this parameter to Recreate.
spec.resourcePolicy.containerPolicies	No	Specified VPA policies for container resources, including the minimum and maximum number of resources allowed for containers For details, see Table 2.

**Table 2** Key fields in containerPolicy
Field	Mandatory	Description
containerName	Yes	Container name
minAllowed	No	The minimum number of resources allowed for a container. The number of VPA recommended resources cannot be lower than the value of this parameter. Supported resources: CPU Memory
maxAllowed	No	The maximum number of resources allowed for a container. The number of VPA recommended resources cannot be higher than the value of this parameter. Supported resources: CPU Memory
controlledResources	No	The types of container resources controlled by the VPA The default value is ["cpu", "memory"]. Supported resources: CPU Memory
mode	No	Whether the VPA policy of a container works. The default value is Auto. Options: Auto: The VPA policy for the container is enabled. Off: The VPA policy for the container is disabled.

Wait for the VPA to generate the resource recommendations and run the following command to check the VPA recommendations:

kubectl get vpa hamster-vpa -oyaml

The command output is as follows:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: hamster-vpa
  namespace: default
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      controlledResources:
      - cpu
      - memory
      maxAllowed:
        cpu: 1
        memory: 500Mi
      minAllowed:
        cpu: 100m
        memory: 50Mi
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hamster
  updatePolicy:
    updateMode: "Off"
status:
  conditions:
  - lastTransitionTime: "2024-06-27T07:37:01Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: hamster
      lowerBound:
        cpu: 475m
        memory: 262144k
      target:
        cpu: 587m
        memory: 262144k
      uncappedTarget:
        cpu: 587m
        memory: 262144k
      upperBound:
        cpu: 673m
        memory: 262144k

The status.recommendation field specifies the resource recommendations generated by the VPA.

If updateMode is set to Auto, the VPA will change the requested resources of the pod based on the recommended resources. This change will trigger the rebuilding of the pod.

**Table 3** Key fields in containerRecommendation
Field	Description
containerName	Name of the container for which the VPA policy takes effect
target	Recommended resources generated by the VPA. The resources are calculated based on the minimum and maximum number of resources configured in the containerPolicy field. The VPA uses these values to dynamically adjust the number of pod resources that can be requested.
lowerBound	The minimum number of recommended resources
upperBound	The maximum number of recommended resources
uncappedTarget	Actual recommended resources generated by the VPA. The resources are not calculated based on the minimum and maximum number of resources configured in the containerPolicy field.