Scheduling Overview
CCE supports multiple resource and task scheduling policies to enhance application performance and overall cluster resource utilization. This section describes the main functions of CPU scheduling, GPU/NPU heterogeneous scheduling, and Volcano scheduling.
CPU Scheduling
CCE provides CPU management policies that enable the allocation of complete physical CPU cores to applications. This improves application performance and reduces scheduling latency.
Function |
Description |
Documentation |
---|---|---|
CPU policy |
If a node runs a large number of CPU-intensive pods, workloads may be migrated between CPU cores. For CPU-sensitive applications, you can allocate dedicated physical cores to them using the CPU management policy provided by Kubernetes. This improves application performance and reduces scheduling latency. |
|
Enhanced CPU policy |
Based on the conventional CPU management policy, this policy supports intelligent scheduling for burstable pods, whose CPU request and limit values must be positive integers. These pods can use specific CPU cores preferentially, but they do not exclusively use these CPU cores. |
GPU Scheduling
CCE provides GPU scheduling for clusters, facilitating refined resource allocation and optimizing resource utilization. This accommodates the specific GPU compute needs of diverse workloads, thereby enhancing the overall scheduling efficiency and service performance of the cluster.
Function |
Description |
Documentation |
---|---|---|
Default GPU scheduling in Kubernetes |
You can specify the number of GPUs that a pod requests. The value can be less than 1 so that multiple pods can share a single GPU. |
|
GPU virtualization |
GPU virtualization dynamically divides the GPU memory and computing power. A single GPU can be virtualized into a maximum of 20 virtual GPU devices. Virtualization is more flexible than static allocation. You can specify the number of GPUs on the basis of stable service running to improve GPU utilization. |
|
GPU monitoring |
GPU metrics include those provided by CCE (GPU Metrics Provided by CCE) and those provided by DCGM (GPU Metrics Provided by DCGM). Prometheus and Grafana comprehensively monitor GPU metrics. This helps optimize compute performance, quickly identify faults, and efficiently schedule resources. This leads to improved GPU utilization and reduced O&M costs. |
|
GPU auto scaling |
CCE allows you to configure auto scaling policies for workloads and nodes based on GPU metrics to dynamically schedule and optimize resources. This improves computing efficiency, ensures stable service operation, and reduces O&M costs. |
|
GPU fault handling |
If a GPU becomes faulty, CCE promptly reports an event and isolates the faulty GPU based on the event information. This ensures that other functional GPUs can continue operating, minimizing the impact on services. |
NPU Scheduling
CCE provides NPU scheduling for clusters, facilitating efficient processing of inference and image recognition tasks.
Function |
Description |
Documentation |
---|---|---|
Complete NPU allocation |
CCE allocates NPU resources to workload pods based on the requested count. |
|
NPU topology-aware scheduling |
In this scheduling mode, CCE adapts scheduling policies to the topology between Ascend AI processors and nodes, minimizing resource fragmentation and network congestion, and maximizing NPU utilization. |
|
NPU virtualization |
A physical NPU (Ascend AI product) is virtualized and partitioned into multiple vNPUs for allocation to multiple containers. This enables flexible partitioning and dynamic management of hardware resources. |
|
NPU monitoring |
Monitoring NPU metrics in a cluster identifies performance bottlenecks, optimizes resource utilization, and quickly locates exceptions, enhancing system stability and efficiency. In CCE standard and Turbo clusters, NPU-Exporter collects NPU metrics via DCMI or hccn tool and reports them to a cloud-native monitoring system for real-time monitoring and alarm reporting, boosting system reliability and performance. |
Volcano Scheduling
Volcano is a Kubernetes-based batch processing platform that supports machine learning, deep learning, bioinformatics, genomics, and other big data applications. It provides general-purpose, high-performance computing capabilities, such as job scheduling, heterogeneous chip management, and job running management.
Function |
Description |
Documentation |
---|---|---|
Resource utilization-based scheduling |
Scheduling policies are optimized for computing resources to effectively reduce resource fragments on each node and maximize computing resource utilization. |
|
Priority-based scheduling |
Scheduling policies are customized based on service importance and priorities to guarantee the resources of key services. |
|
AI performance-based scheduling |
Scheduling policies are configured based on the nature and resource usage of AI tasks to increase the throughput of cluster services and improve service performance. |
|
Queue scheduling |
Queue resource management dynamically allocates cluster resources, prioritizes high-priority tasks, and optimizes resource utilization and job throughput. |
|
NUMA affinity scheduling |
Volcano targets to lift the limitation to make scheduler NUMA topology aware so that:
|
|
Application scaling priority policies |
With application scaling priority policies, you can customize the scaling order of pods across different node types to manage resources more efficiently. |
Cloud Native Hybrid Deployment
The cloud native hybrid deployment solution focuses on the Volcano and Kubernetes ecosystems to help users improve resource utilization and efficiency and reduce costs.
Function |
Description |
Documentation |
---|---|---|
Dynamic resource oversubscription |
Based on the types of online and offline jobs, Volcano scheduling is used to utilize the resources that are requested but not used in the cluster (the difference between the number of requested resources and the number of used resources) for resource oversubscription and hybrid deployment to improve cluster resource utilization. |
|
Resource oversubscription based on pod profiling |
An oversubscription algorithm that continuously monitors pod CPU and memory usage, analyzes usage probabilities, and assesses node resource usage with a certain confidence level. It calculates a stable oversubscription amount by accounting for overall resource usage and fluctuations, thereby minimizing resource contention and preventing frequent pod evictions caused by service instability. This approach outperforms oversubscription algorithms based on nodes' real-time CPU and memory usage by significantly reducing oversubscription fluctuations and enhancing burst resource coverage. It ensures stable service performance while effectively implementing resource oversubscription. |
|
CPU Burst |
CPU Burst is an elastic traffic limiting mechanism that allows temporarily exceeding the CPU limit to reduce the long-tail response time of services and improve the quality of latency-sensitive services. |
|
Egress network bandwidth guarantee |
The egress network bandwidth used by online and offline services is balanced to ensure sufficient network bandwidth for online services. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot