Auto Scaling
Pod Orchestration and Scheduling describes how to control the number of pods by using controllers such as Deployments. You can manually scale in or out applications by adjusting the number of pods, but manual scaling can be slow and complex, which is a problem when fast scaling is required to handle traffic surges.
To solve this, Kubernetes supports auto scaling for both pods and nodes. By defining auto scaling rules, Kubernetes can dynamically scale pods and nodes based on metrics like CPU usage.
Prometheus and Metrics Server
To enable auto scaling in Kubernetes, the system must first be able to monitor key performance metrics, such as CPU and memory usage for nodes, pods, and containers. However, Kubernetes does not include built-in monitoring capabilities. It instead relies on external projects to extend its functionality.
- Prometheus is an open-source monitoring and alerting framework that collects a wide range of metrics, making it the standard monitoring solution for Kubernetes.
- Metrics Server functions as a resource usage aggregator in Kubernetes clusters, pulling data from the Summary API exposed by kubelet. It provides standardized APIs for external systems, offering insights into core Kubernetes resources such as pods, nodes, containers, and Services.
Horizontal Pod Autoscaler (HPA) integrates with Metrics Server to implement auto scaling based on CPU and memory usage. Additionally, HPA can work with Prometheus to enable auto scaling using custom monitoring metrics.
How HPA Works
An HPA controls horizontal scaling of pods. It periodically checks pod metrics, calculates how many pods are needed to meet target values, and updates the replicas field of the associated workload such as a Deployment.

You can configure one or more metrics for an HPA. When only one metric is used, the HPA totals the metric values from the current pods, divides that total by the expected value, and rounds up the result to determine the required number of pods. For example, if a Deployment has three pods with the CPU usage of each pod at 70%, 50%, and 90%, respectively, and the expected CPU usage configured for HPA is 50%, the expected number of pods is calculated as follows: (70 + 50 + 90)/50 = 4.2. The required number of pods is rounded up to 5.
If multiple metrics are configured, the expected number of pods of each metric is calculated, and the maximum value will be used.
Using an HPA
The following example demonstrates how to use an HPA. First, create a Deployment with four pods using an Nginx image.
$ kubectl get deploy NAME READY UP-TO-DATE AVAILABLE AGE nginx-deployment 4/4 4 4 77s $ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-deployment-7cc6fd654c-5xzlt 1/1 Running 0 82s nginx-deployment-7cc6fd654c-cwjzg 1/1 Running 0 82s nginx-deployment-7cc6fd654c-dffkp 1/1 Running 0 82s nginx-deployment-7cc6fd654c-j7mp8 1/1 Running 0 82s
Create an HPA. The expected CPU usage is 70%, and the number of pods ranges from 1 to 10.
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: scale namespace: default spec: scaleTargetRef: # Target resource apiVersion: apps/v1 kind: Deployment name: nginx-deployment minReplicas: 1 # The minimum number of pods for the target resource maxReplicas: 10 # The maximum number of pods for the target resource metrics: # Metric. The expected CPU usage is 70%. - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70
Create the HPA and check its details.
$ kubectl create -f hpa.yaml horizontalpodautoscaler.autoscaling/scale created $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE scale Deployment/nginx-deployment 0%/70% 1 10 4 18s
In the command output, the expected value of TARGETS is 70%, but the actual value is 0%. This means that the HPA will scale in some pods. The expected number of pods can be calculated as follows: (0 + 0 + 0 + 0)/70 = 0. However, the minimum number of pods was set to 1, so the number of pods will be 1. After a while, you can see that there is only one pod.
$ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-deployment-7cc6fd654c-5xzlt 1/1 Running 0 7m41s
Check the HPA again. You can see that there is a record similar to the following under Events. This record shows that 21 seconds ago, the HPA scaled in the Deployment, reducing the total pod count to 1. The adjustment occurred because the number of pods calculated from all metrics fell below the expected value.
$ kubectl describe hpa scale ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 21s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
If you check the Deployment details again, you can see that there is a record similar to the following under Events. This record shows that the number of Deployment pods has been adjusted to 1, aligning with the HPA configuration.
$ kubectl describe deploy nginx-deployment ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 7m deployment-controller Scaled up replica set nginx-deployment-7cc6fd654c to 4 Normal ScalingReplicaSet 1m deployment-controller Scaled down replica set nginx-deployment-7cc6fd654c to 1
Cluster Autoscaler
HPAs focus on scaling pods, but when cluster resources become insufficient, the only option is to add nodes. Scaling cluster nodes can be complex, but in cloud-based environments, nodes can be dynamically added or removed using APIs, making the process much more convenient.
Kubernetes offers Cluster Autoscaler, a component designed to automatically scale cluster nodes based on pod scheduling demands and resource usage. However, because this relies on cloud provider APIs, the implementation and usage vary across different environments.
For details about the implementation in CCE, see Creating a Node Scaling Policy.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.