Overview

Why Workload Scaling?

The ever-changing application traffic brings changing resource requirements to container workloads. During workload deployment and management, if resources are reserved for a workload based on the service requirements at peak hours, a large number of resources will be wasted. If a resource threshold is set for a workload, applications may be abnormal when the resource usage exceeds the threshold. In Kubernetes, a Horizontal Pod Autoscaler (HPA) can automatically scale in or out pods for workloads in a single cluster in response to metric changes. However, the HPA does not apply to multi-cluster scenarios.

UCS provides you with automatic workload scaling in multi-cluster scenarios. The automatic workload scaling is based on metric changes or at regular intervals, which raises scaling flexibility and stability.

Advantages

UCS workload scaling has the following advantages:

Multi-cluster: You can configure the same scaling policy for multiple clusters in the federation.
High availability: Pods in your workload can be quickly scaled out at peak hours to ensure workload availability, or scaled in at off-peak hours to save resources.
Multi-function: Pods in your workload can be scaled in or out based on metric changes or at regular intervals in complex scenarios.

Multi-scenario: You can configure scaling policies for online services, large-scale computing and training, and training and inference on deep learning GPUs or shared GPUs.

Working Principles

UCS workload scaling is implemented by FederatedHPA and CronFederatedHPA, as shown in Figure 1.

FederatedHPA can automatically scale in or out pods for workloads in response to system metrics or custom metrics. When the metric reaches the desired value, workload scaling is triggered.
CronFederatedHPA can automatically scale in or out pods for workloads at regular intervals. When the triggering time arrives, workload scaling is triggered.

Figure 1 Working principles of workload scaling

Constraints

UCS scaling policies apply only to Deployments. For details about the comparisons among different types of workloads, see Workloads.
UCS scaling policies are used to scale in or out pods for workloads. To schedule the pods to specific clusters, you need to configure scheduling policies.

Parent topic: Multi-Cluster Workload Scaling

Previous topic: Multi-Cluster Workload Scaling

Next topic: Using Scaling Policies