Updated on 2025-08-19 GMT+08:00

Auto Scaling Overview

As applications increasingly run on Kubernetes, the ability to rapidly scale out during peak times and scale in during off-peak hours becomes crucial for efficiently managing resources and reducing costs.

Auto scaling is widely used in CCE. Typical use cases are as follows:

  • Online service scaling: Pods and nodes are automatically scaled out during peak hours (for example, holidays and promotions) to handle increased user requests, and scaled in during off-peak hours to cut costs.
  • Large-scale computing training: The number of pods and nodes is dynamically adjusted according to the demands of computing tasks to accelerate execution.
  • Deep learning GPU training and inference: GPU resources are dynamically allocated to optimize the efficiency of training and inference tasks. GPU nodes are automatically added or removed as needed to enhance resource utilization.
  • Scheduled or periodic resource adjustment: Pods and nodes are automatically scaled at specific times to accommodate scheduled tasks. Resources are dynamically adjusted based on task requirements to ensure smooth execution.

Auto Scaling in CCE

CCE supports auto scaling for workloads and nodes.

  • Workload scaling involves adjusting the number or specifications of pods at the scheduling layer to adapt to changes in workload demands. For example, the number of pods can be automatically increased during peak hours to handle more user requests and then scaled down during off-peak hours to reduce costs.
  • Node scaling involves dynamically adding or reducing compute resources (such as ECSs) at the resource layer based on the scheduling status of pods. This approach ensures that clusters are well-resourced for high loads and minimizes waste during low demand.

Workload scaling and node scaling can work separately or together. For details, see Using HPA and CA for Auto Scaling of Workloads and Nodes.

Components

Workload Scaling Types

Table 1 Workload scaling types

Type

Component

Description

Reference

HPA

HorizontalPodAutoscaler (built-in Kubernetes component)

HorizontalPodAutoscaler is a built-in component of Kubernetes for Horizontal Pod Autoscaling (HPA). CCE incorporates the application-level cooldown time window and scaling threshold functions into Kubernetes HPA.

Creating an HPA Policy

CustomedHPA

CCE Advanced HPA

An enhanced auto scaling feature, used for auto scaling of Deployments based on metrics (CPU usage and memory usage) or at a periodic interval (a specific time point every day, every week, every month, or every year).

Creating a CustomedHPA Policy

CronHPA

CCE Advanced HPA

CronHPA can scale in or out a cluster at a fixed time. It can work with HPA policies to periodically adjust the HPA scaling scope, implementing workload scaling in complex scenarios.

Creating a Scheduled CronHPA Policy

VPA

VPA

Vertical Pod Autoscaler in Kubernetes.

Creating a VPA Policy

AHPA

CCE Advanced HPA

Advanced Horizontal Pod Autoscaler, which performs scaling beforehand based on historical data.

Creating an AHPA Policy

Figure 1 Workload scaling

Node Scaling Types

Table 2 Node scaling types

Component Name

Description

Application Scenario

Reference

CCE Cluster Autoscaler

An open source Kubernetes component for horizontal scaling of nodes, which is optimized by CCE in scheduling, auto scaling, and costs.

Online services, deep learning, and large-scale computing with limited resource budgets

Creating a Node Scaling Policy

CCE Cloud Bursting Engine for CCI

Used to extend Kubernetes APIs to serverless container platforms (such as CCI), which means you no longer have to worry about node resources.

Online traffic surge, CI/CD, big data, and more

CCI Scaling Policies

Figure 2 Node scaling