Updated on 2024-03-11 GMT+08:00

Overview

Auto scaling is a service that automatically and economically adjusts service resources based on your service requirements and configured policies.

Context

More and more applications are developed based on Kubernetes. It becomes increasingly important to quickly scale out applications on Kubernetes to cope with service peaks and to scale in applications during off-peak hours to save resources and reduce costs.

In a Kubernetes cluster, auto scaling involves pods and nodes. A pod is an application instance. Each pod contains one or more containers and runs on a node (VM or bare-metal server). If a cluster does not have sufficient nodes to run new pods, add nodes to the cluster to ensure service running.

In CCE, auto scaling is used for online services, large-scale computing and training, deep learning GPU or shared GPU training and inference, periodic load changes, and many other scenarios.

Auto Scaling in CCE

CCE supports auto scaling for workloads and nodes.

  • Workload scaling: Auto scaling at the scheduling layer to change the scheduling capacity of workloads. For example, you can use the HPA, a scaling component at the scheduling layer, to adjust the number of replicas of an application. Adjusting the number of replicas changes the scheduling capacity occupied by the current workload, thereby enabling scaling at the scheduling layer.
  • Node scaling: Auto scaling at the resource layer. When the planned cluster nodes cannot allow workload scheduling, ECS resources are provided to support scheduling.

Workload scaling and node scaling can work separately or together. For details, see Using HPA and CA for Auto Scaling of Workloads and Nodes.

Components

Workload scaling components are described as follows:

Table 1 Workload scaling components

Type

Component Name

Component Description

Reference

HPA

Kubernetes Metrics Server

A built-in component of Kubernetes, which enables horizontal scaling of pods. It adds the application-level cooldown time window and scaling threshold functions based on the HPA.

HPA Policies

CustomedHPA

CCE Advanced HPA

An enhanced auto scaling feature, used for auto scaling of Deployments based on metrics (CPU usage and memory usage) or at a periodic interval (a specific time point every day, every week, every month, or every year).

CustomedHPA Policies

Prometheus (EOM)

Cloud Native Cluster Monitoring

An open-source system monitoring and alarm framework, which collects public metrics (CPU usage and memory usage) of kubelet in the Kubernetes cluster.

CronHPA

CCE Advanced HPA

CronHPA can scale in or out a cluster at a fixed time. It can work with HPA policies to periodically adjust the HPA scaling scope, implementing workload scaling in complex scenarios.

CronHPA Policies

Node scaling components are described as follows:

Table 2 Node scaling components

Component Name

Component Description

Application Scenario

Reference

CCE Cluster Autoscaler

An open source Kubernetes component for horizontal scaling of nodes, which is optimized by CCE in scheduling, auto scaling, and costs.

Online services, deep learning, and large-scale computing with limited resource budgets

Creating a Node Scaling Policy

CCE Cloud Bursting Engine for CCI

Used to extend Kubernetes APIs to serverless container platforms (such as CCI), which means you no longer have to worry about node resources.

Online traffic surge, CI/CD, big data, and more

Elastic Scaling of CCE Pods to CCI