AI Workload Scheduling

This section describes the key functions of Volcano Scheduler in AI workload scheduling, including auto scheduling, task scheduling, heterogeneous resource scheduling, and queue scheduling. Volcano Scheduler offers general computing capabilities, including a high-performance task scheduling engine, efficient management of heterogeneous chips, and advanced task execution management. These features enhance the scheduling efficiency and execution performance of AI workloads.

Auto Scheduling

Volcano Scheduler supports priority-based scheduling specifically designed to optimize application scaling.

**Table 1** Features for auto scheduling
Feature	Description
Application scaling priority policies	With application scaling priority policies, you can manage resources more efficiently by customizing the scaling order of pods across different types of nodes. There are: Scale-out policies: During a workload scale-out, Volcano Scheduler schedules new pods based on node priorities. Scale-in policies: During a workload scale-in, Volcano Scheduler scores the workload pods based on the priorities of their nodes and then establishes the pod deletion order based on these scores. By default, yearly/monthly nodes are prioritized over pay-per-use nodes. During a scale-out, Volcano Scheduler schedules pods to yearly/monthly nodes first. During a scale-in, it deletes pods from pay-per-use nodes before those on yearly/monthly nodes.

Task Scheduling

Volcano Scheduler provides Dominant Resource Fairness (DRF) and gang scheduling for batch computing tasks.

**Table 2** Features for task Scheduling
Feature	Description
DRF	Volcano Scheduler supports fair scheduling (DRF), which is built on the max-min fairness share algorithm. This ensures equitable resource allocation among multiple users by evaluating key resources like CPUs, memory, and storage, and allocating them fairly according to individual user requirements during scheduling. By implementing DRF, service throughput of clusters can be maximized, overall execution time shortened, and training performance enhanced. This makes it an ideal scheduling approach for workloads like batch AI training and big data processing.
Gang	Volcano Scheduler supports gang scheduling, an "all-or-nothing" approach that prevents resource wastage from arbitrary pod scheduling. It checks if the number of pods scheduled for a job meets the minimum required for execution. If the threshold is met, all pods are scheduled simultaneously; otherwise, none are scheduled. Gang scheduling reduces resource busy-waiting and deadlocks in distributed training, thereby enhancing cluster resource utilization.

Heterogeneous Resource Scheduling

Volcano Scheduler provides GPU sharing scheduling, NUMA-aware scheduling, and NPU topology scheduling for heterogeneous resources such as CPUs, GPUs, and NPUs.

**Table 3** Features for heterogeneous resource scheduling
Feature	Description
GPU virtualization scheduling	Volcano Scheduler facilitates GPU virtualization scheduling and isolation on GPU nodes, offering the following policies for managing GPU virtualization workloads: Spread: When the GPU nodes share identical configurations, the node with the fewest active pods is prioritized to evenly distribute GPU virtualization workloads. Binpack: GPU virtualization workloads are scheduled onto the same node whenever possible. This minimizes the number of nodes used, effectively reducing resource fragmentation.
NUMA affinity scheduling	Volcano Scheduler includes NUMA affinity scheduling, a feature that assigns pods to worker nodes with minimal cross-NUMA node access. This approach reduces data transmission overhead, optimizes resource utilization, and enhances overall system performance.
NPU topology-aware affinity scheduling on a single node	Volcano Scheduler provides intra-node NPU topology-aware scheduling, leveraging the hardware topology of Ascend AI processors. This intelligent resource management technology optimizes resource allocation and network path selection, effectively reducing compute resource fragmentation and minimizing network congestion. As a result, it maximizes NPU compute utilization and significantly enhances the efficiency of AI training and inference tasks. This ensures the efficient scheduling and management of Ascend compute resources.
Topology-aware affinity scheduling on hypernodes	Volcano Scheduler supports hypernode topology affinity scheduling. A hypernode is composed of 48 nodes. The NPUs within the hypernode form a hyperplane network through a specialized network connection. This configuration enables significantly faster data transmission rates compared to traditional setups. Hypernode topology affinity scheduling assigns pods with a high interdependence to the same hypernode. By doing so, it minimizes cross-node communication, reduces network latency, and boosts data transmission speeds.

Queue Scheduling

Volcano Scheduler supports queue scheduling to effectively manage AI and batch computing tasks.

**Table 4** Features for queue scheduling
Feature	Description
Queue resource management (capacity plugin)	Queue is a core concept in Volcano. It is designed to support resource allocation and tasks scheduling in multi-tenant scenarios. With queues, you can implement multi-tenant resource allocation, task priority control, and resource preemption and reclamation. All these significantly improve cluster resource utilization and task scheduling efficiency.
Hierarchical queues	In real-world applications, different queues typically belong to different departments, which often have hierarchical relationships. This structure leads to more complex, refined requirements for resource allocation and preemption. Traditional peer queues, however, cannot meet these needs effectively. To address this, Volcano Scheduler introduces hierarchical queues, which enable resource allocation, sharing, and preemption across different levels. With hierarchical queues, you can manage resource quotas at a finer granularity and build a more efficient, unified scheduling platform.