Help Center/ Cloud Container Engine/ Best Practices/ Cluster/ Managing Costs for a Cluster

Updated on 2026-03-23 GMT+08:00

Managing Costs for a Cluster

The key to optimizing cluster costs is to maximize the utilization of cluster resources and minimize unnecessary expenses. It is important to note that cost optimization goes beyond just reducing resources; it also involves finding a balance between cost optimization and cluster reliability. This section provides a summary of the best practices for cluster cost optimization, which will help you efficiently manage cluster expenses and enhance overall efficiency.

Using Appropriate Cluster Configurations

Before setting up a cluster, it is important to assess the resource needs of your applications. This will help you choose the appropriate cluster type, node type, and cluster billing mode, all of which can contribute to building a cost-effective cluster.

Selecting a Cluster Type

CCE provides many cluster types. You can select a proper one based on your service characteristics. The table below lists the differences between different types of clusters.

Category	CCE Standard	CCE Turbo
Description	An enterprise-level container service on Kubernetes	Hardware-software synergy for extra performance
Managed object	Clusters, nodes, and workloads	Clusters, nodes, and workloads
Cluster scale	On-demand adjustment	On-demand adjustment
Node	Multiple flavors available, custom node creation or deletion	Multiple flavors available, custom node creation or deletion
Compute	Heterogeneous compute including x86, Arm, and NPUs	Heterogeneous compute including x86, Arm, and NPUs
Scheduling	Proprietary Volcano for various scheduling policies and improved task execution efficiency	Hybrid scheduling for improved cluster resource utilization
Network	VPC network overlaid with container network	VPC network and container network combined into a single layer for zero performance loss
Security	Container network access control based on network policies	Kata Containers that allow containers to run inside lightweight VMs

For details, see Comparison Between Cluster Types.

Selecting a Node Flavor

ECSs come in various flavors, each offering different computing and storage capabilities. Typically, higher flavors, such as CPUs and memory, and specialized features like GPUs and NPUs result in higher costs per node. It is important to configure stable, cost-effective ECSs that align with the specific needs of your services.

Selecting a Billing Mode for a Node

Different services have varying resource usage periods and stability requirements. To achieve cost-effectiveness, you can choose the appropriate billing mode based on the service characteristics.

Billing Mode	Description
Yearly/Monthly	Yearly/Monthly is a prepaid mode in which you pay for a service before using it. Your bill is generated based on the required duration you specify in the order. The longer the subscription period, the greater the discount. Yearly/Monthly billing is a good option for long-term, stable services.
Pay-per-use	Pay-per-use is a postpaid billing mode. You pay as you go and just pay for what you use. The prices are calculated by the second but billed every hour. Pay-per-use billing allows you to flexibly adjust resource usage. You neither need to prepare for resources in advance, nor end up with excessive or insufficient preset resources. It is a good option for scenarios where there are sudden traffic bursts, such as e-commerce promotions.
Spot pricing	Spot pricing is a postpaid billing mode. The prices are adjusted gradually based on long-term trends in supply and demand for spot instance capacity. The resource usage is calculated by the second but billed every hour. You need to set a maximum price you are willing to pay for a spot instance. If inventory resources are insufficient or the market price rises above your maximum price, the spot instance will be reclaimed. NOTICE: Spot instances are ideal for stateless, cost-sensitive applications that can tolerate interruptions. A spot instance is not recommended for workloads that need to run for a long time or that require high stability.

Billing Mode

Description

Yearly/Monthly

Yearly/Monthly is a prepaid mode in which you pay for a service before using it. Your bill is generated based on the required duration you specify in the order. The longer the subscription period, the greater the discount.

Yearly/Monthly billing is a good option for long-term, stable services.

Pay-per-use

Pay-per-use is a postpaid billing mode. You pay as you go and just pay for what you use. The prices are calculated by the second but billed every hour. Pay-per-use billing allows you to flexibly adjust resource usage. You neither need to prepare for resources in advance, nor end up with excessive or insufficient preset resources.

It is a good option for scenarios where there are sudden traffic bursts, such as e-commerce promotions.

Spot pricing

Spot pricing is a postpaid billing mode. The prices are adjusted gradually based on long-term trends in supply and demand for spot instance capacity. The resource usage is calculated by the second but billed every hour.

You need to set a maximum price you are willing to pay for a spot instance. If inventory resources are insufficient or the market price rises above your maximum price, the spot instance will be reclaimed.

NOTICE:

Spot instances are ideal for stateless, cost-sensitive applications that can tolerate interruptions. A spot instance is not recommended for workloads that need to run for a long time or that require high stability.

For details, see Billing Items.

Clearing Idle Resources in a Timely Manner

It is a good option to identify and clear idle cloud services or cluster resources in a timely manner, such as unused ECSs, EVS disks, OBS buckets, ELB load balancers, and EIPs.

Optimizing Resource Configuration for a Workload

Setting resource requests and limits too high leads to resource wastage, while setting them too low affects workload stability. By properly configuring resource requests and limits, cluster resource utilization can be improved, resulting in reduced costs.

Configuring Proper Resource Requests and Limits

To ensure that your workloads have enough resources and to avoid wasting resources due to excessive requests, configure appropriate requests and limits.

Managing Quotas for a Namespace

Quota management sets limits on the total number of resources that teams and users can use when they share cluster resources. These resources include the number of objects of a specific type created in a namespace, as well as the total number of compute resources like CPUs and memory used by these objects.

This approach helps minimize unnecessary resource overhead.

For details, see Configuring Resource Quotas.

Using Cluster Auto Scaling

CCE provides auto scaling in seconds. It automatically adjusts compute resources based on preset policies and your service needs to ensure that the number of cloud servers or containers increases or decreases with service load. This ensures stable, healthy services, improves cluster resource utilization, and reduces costs.

For details, see Workload Scaling Rules.

Using Application Auto Scaling

CCE provides auto scaling for applications. This feature enables applications that experience traffic surges or periodic peak and off-peak hours to automatically adjust their compute resources.

Application scaling on demand (HPA)

Application auto scaling helps dynamically adjust compute resources based on service requirements and policies. It enables quick scale-outs during peak hours and scale-ins during off-peak hours, optimizing resource utilization and reducing costs.

The table below lists the auto scaling approaches supported by CCE.

Policy	Description
HPA	Scales Deployments based on metrics like CPU and memory usage. For details, see Horizontal Pod Autoscaling. It adds cooldown time windows and scaling thresholds for applications based on the Kubernetes HPAs. HPA policies apply to scenarios where services experienced fluctuating traffic, many services are deployed, and frequent scaling is required.
CronHPA	Scales Deployment periodically (daily, weekly, monthly, or yearly at a specific time). CronHPA policies apply to scenarios where the application resource usage changes periodically.

Policy

Description

HPA

Scales Deployments based on metrics like CPU and memory usage. For details, see Horizontal Pod Autoscaling. It adds cooldown time windows and scaling thresholds for applications based on the Kubernetes HPAs.

HPA policies apply to scenarios where services experienced fluctuating traffic, many services are deployed, and frequent scaling is required.

CronHPA

Scales Deployment periodically (daily, weekly, monthly, or yearly at a specific time).

CronHPA policies apply to scenarios where the application resource usage changes periodically.

Scaling a Node

Application auto scaling helps dynamically adjust the number of pods based on workload metrics. If there are not enough cluster resources and new pods cannot run properly, you can add more nodes to the cluster.

For details, see Node Scaling Rules.

The table below lists auto scaling policies that you can select.

Policy Name	Description
Manual scaling	You can manually scale in or out nodes in a node pool. If the resources of the selected flavor are insufficient or the quota is insufficient, the scale-out will fail.
Auto scaling	CCE Cluster Autoscaler automatically scales in or out nodes in a cluster based on the pod scheduling status and resource usage. It supports multiple scaling modes, such as multi-AZ, multi-pod-specifications, metric triggering, and periodic triggering, to meet the requirements of different node scaling scenarios. Scale-out: CCE Cluster Autoscaler checks all unscheduled pods every 10 seconds and selects a node pool that meets the requirements for scale-out based on the policy you set. Scale-in: CCE Cluster Autoscaler scans all nodes every 10 seconds. If the number of pod requests on a node is less than the custom scale-in threshold (in percentage), CCE Cluster Autoscaler will check whether pods on the current node can be migrated to other nodes.

Policy Name

Description

Manual scaling

You can manually scale in or out nodes in a node pool. If the resources of the selected flavor are insufficient or the quota is insufficient, the scale-out will fail.

Auto scaling

CCE Cluster Autoscaler automatically scales in or out nodes in a cluster based on the pod scheduling status and resource usage. It supports multiple scaling modes, such as multi-AZ, multi-pod-specifications, metric triggering, and periodic triggering, to meet the requirements of different node scaling scenarios.

Scale-out: CCE Cluster Autoscaler checks all unscheduled pods every 10 seconds and selects a node pool that meets the requirements for scale-out based on the policy you set.
Scale-in: CCE Cluster Autoscaler scans all nodes every 10 seconds. If the number of pod requests on a node is less than the custom scale-in threshold (in percentage), CCE Cluster Autoscaler will check whether pods on the current node can be migrated to other nodes.

Optimizing Application Scheduling

During cloud native progress, it is important to strike a balance between performance and service quality. This can be achieved by carefully considering the service deployment solutions and architectures. Depending on your specific service scenarios, you can choose an appropriate scheduling solution to optimize resource utilization and manage costs efficiently.

Using Cloud Native Hybrid Deployment

Use cloud native hybrid deployment if your services are deployed in the scenarios below.

Nodes are deployed in different clusters. They cannot share compute resources with each other, resulting in an increase in resource fragments.
The node flavors are not ideal for applications that undergo frequent changes. At first, the node flavors match the application requirements, resulting in a high resource allocation rate. However, as the applications evolve, their resource demands change, causing a significant difference in the ratio of requested resources to node flavors. This leads to a decrease in the allocation rate of node resources and an increase in compute resource fragments.
There are a large number of reserved resources. Online services experience daily peaks and troughs. To ensure service performance and stability, users apply for resources based on peak usage, which may result in many idle resources in the cluster during certain times.
Online and offline services are deployed in separate Kubernetes clusters, and resources cannot be shared between them across different times. This means that during off-peak hours for online services, the resources cannot be used by offline services.

The table below lists cloud native hybrid deployment features that can help you improve resource utilization, reduce costs, improve efficiency in the scenarios mentioned earlier.

Feature	Description
Dynamic resource oversubscription	Based on the types of online and offline jobs, Volcano is used to optimize cluster resource utilization using the requested but unused resources (the difference between the requested and used resources) for resource oversubscription and hybrid deployment. For details, see Dynamic Resource Oversubscription.

Feature

Description

Dynamic resource oversubscription

Based on the types of online and offline jobs, Volcano is used to optimize cluster resource utilization using the requested but unused resources (the difference between the requested and used resources) for resource oversubscription and hybrid deployment.

For details, see Dynamic Resource Oversubscription.

Enabling Resource Usage-based Scheduling

Volcano Scheduler can improve cluster resource utilization. It provides bin packing, descheduling, node pool affinity, and load-aware scheduling policies.

Scheduling Policy	Description
Bin packing	Bin packing is an optimization algorithm that aims to reduce cluster resource fragments. After bin packing is enabled for cluster workloads, the scheduler preferentially schedules pods to nodes with high resource allocation. This reduces resource fragments on each node and improves cluster resource utilization. For details, see Bin Packing.
Descheduling	Volcano Scheduler can remove pods that do not meet the configured policies and reschedule them according to those policies. This helps balance the cluster loads and minimize resource fragmentation. For details, see Descheduling.
Node pool affinity	When it comes to scenarios like node pool replacement and rolling node upgrades, it becomes necessary to replace an old node pool with a new one. To prevent the node pool replacement from affecting services, it is advised to enable soft affinity, which allows for scheduling service pods to the new node pool. For details, see Node Pool Affinity.
Load-aware scheduling	Volcano Scheduler offers CPU and memory load-aware scheduling for pods and preferentially schedules pods to the node with the lightest load to balance node loads. This prevents an application or node failure due to heavy loads on a single node. For details, see Load-aware Scheduling.

Enabling Priority-based Scheduling and Preemption

A pod priority indicates the importance of a pod relative to other pods. Volcano supports pod PriorityClasses in Kubernetes. After PriorityClasses are configured, the scheduler preferentially schedules high-priority pods. When cluster resources are insufficient, the scheduler will proactively evict low-priority pods to make it possible to schedule pending high-priority pods. For details, see Priority-based Scheduling and Preemption.

The table below lists the types of priority-based scheduling and preemption supported by CCE.

Scheduling Type	Description
Priority-based scheduling	The scheduler preferentially guarantees the running of high-priority pods, but will not evict low-priority pods. Priority-based scheduling is enabled by default and cannot be disabled.
Priority-based preemption	When cluster resources are insufficient, the scheduler will proactively evict low-priority pods to make it possible to schedule pending high-priority pods.

Using Shared GPUs

GPU virtualization allows for the separation of compute and GPU memory, optimizing the utilization of GPUs. CCE GPU virtualization leverages the proprietary xGPU virtualization technology to dynamically separate GPU memory and compute. This virtualization solution offers greater flexibility compared to static allocation. While ensuring maximum service stability, you have the freedom to define the number of GPUs to be used, thereby enhancing GPU utilization.

For details, see GPU Virtualization.

Enabling AI Performance-based Scheduling

In AI and big data collaborative scheduling scenarios, Volcano Dominant Resource Fairness (DRF) and Gang scheduling can be used to improve training performance and resource utilization.

DRF

DRF is a scheduling algorithm based on dominant-resource fairness. It is designed for scenarios involving large-scale submissions of AI training and big data jobs, helping improve overall cluster throughput, shorten job execution time, and enhance training performance. For details, see DRF.

In actual services, limited cluster resources are often allocated to multiple users. Each user has the same rights to obtain resources, but the number of resources they need may be different. It is crucial to fairly allocate resources to each user. A common scheduling algorithm is the max-min fairness share, which allocates resources to meet users' minimum requirements as far as possible and then fairly allocates the remaining resources. The rules are as follows:

Resources are allocated in order of increasing demand.
No user gets a resource share larger than their demands.
Users with unsatisfied demands get an equal share of the resource.

Gang

Gang scheduling meets the scheduling requirements of "All or nothing" in the scheduling process and avoids the waste of cluster resources caused by arbitrary scheduling of pods. It is mainly used in scenarios that require multi-process collaboration, such as AI and big data scenarios. Gang scheduling effectively resolves pain points such as resource waiting or deadlocks in distributed training jobs, thereby significantly improving the utilization of cluster resources. For details, see Gang.

The Gang scheduler algorithm checks whether the number of scheduled pods in a job meets the minimum requirements for running the job. If yes, all pods in the job will be scheduled. If no, the pods will not be scheduled.

It is applicable to the scenarios where multi-process collaboration is required.

AI scenarios typically involve complex workflows, such as data ingestion, data analysis, data splitting, training, serving, and logging. These workflows require multiple containers to run collaboratively and are well-suited for Gang scheduling.
Multi-thread parallel computing communication scenarios under MPI computing framework are suitable for Gang scheduling because primary and secondary processes need to work together.
Containers in a pod are highly correlated, and there may be resource contention. The overall scheduling allocation can effectively resolve deadlocks.
If cluster resources are insufficient, Gang scheduling can significantly improve the utilization of cluster resources.

Enabling NUMA Affinity Scheduling

When working with high-performance computing, real-time applications, or memory-intensive workloads that require frequent communication between CPUs, accessing nodes across non-uniform memory access (NUMA) in a cloud native environment can lead to decreased system performance due to increased latency and overhead. Volcano's NUMA affinity scheduling policy resolves the issue by scheduling pods to the worker node that requires the least number of cross-NUMA nodes. This reduces data transmission overheads, optimizes resource utilization, and enhances overall system performance.

Volcano aims to resolve the NUMA topology-aware scheduling restrictions of the scheduler so that:

Pods are not scheduled to nodes that do not match the NUMA topology.
Pods are scheduled to the most suitable node for NUMA topology.

For details, see NUMA Affinity Scheduling.

Configuring Application Scaling Priority Policies

With application scaling priority policies, you have precise control over the scaling priorities of pods on different types of nodes, allowing for optimized resource management. The application scaling priority policies include:

Scale-outs: Volcano schedules new pods in a cluster based on preset node priority for scale-outs.
Scale-ins: When a workload is specified, Volcano scores the workload based on preset node priority to determine pod deletion sequence during scale-ins.

If the default scaling priority policy is applied, pods will be scheduled first to yearly/monthly nodes during a scale-out, followed by pay-per-use nodes and virtual-kubelet nodes (scaled to CCI). During a scale-in, pods are deleted sequentially from virtual-kubelet nodes (scaled to CCI), pay-per-use nodes, and yearly/monthly nodes. You can adjust the scaling priority policies based on your service scenarios. For details, see Application Scaling Priority Policies.

Parent Topic: Cluster

Previous topic: Protecting a CCE Cluster Against Overload

Next topic: Node and Node Pool

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.