Queue Resource Management (capacity Plugin)

In a Kubernetes cluster, multiple teams or services (such as AI training, big data analysis, and online services) need to share compute resources, and jobs have different resource requirements and priorities. If all jobs compete for resources, there may be the following problems:

Resource contention: High-load jobs occupy the CPU or memory of other jobs, causing performance deterioration.
Unfair scheduling: Key services may be delayed due to insufficient resources, while low-priority jobs occupy too many resources.
Lack of isolation: Abnormal jobs (for example, jobs that cause memory leak) of a tenant may exhaust node resources. As a result, normal services of other tenants are unavailable.

To address these problems, Volcano Scheduler provides queues for resource management. Queue is a core concept in Volcano. It is designed to support resource allocation and job scheduling in multi-tenant scenarios. With queues, you can implement multi-tenant resource allocation, job priority control, and resource preemption and reclamation. All these significantly improve cluster resource utilization and job scheduling efficiency. Queues have the following main features.

**Table 1** Core features of queues
Feature	Description
Flexible resource configuration	Multi-dimensional control over resources (such as CPU, memory, GPU, and NPU) A three-level resource configuration mechanism to control resource allocation. When configuring the resources, you need to ensure that the value of guarantee is less than or equal to that of deserved and the value of deserved is less than or equal to that of capability. capability: specifies the maximum resources allowed for a queue. deserved: specifies the deserved resources for a queue. If no other queues submit jobs, the resources occupied by jobs in the queue can exceed the value of deserved. If other queues submit jobs and the cluster resources are insufficient, the resources that exceed the value of deserved can be reclaimed by other queues. Configuration recommendations: In peer queue scenarios, the sum of the values of deserved for all queues should be equal to the total cluster resources. In hierarchical queue scenarios, the sum of the values of deserved for child queues should be equal to the value of deserved for the parent queue, but cannot exceed the value of deserved for the parent queue. guarantee: specifies the reserved resources. The reserved resources can only be used by the queue. Dynamic resource quota adjustment
Hierarchical queue management	Support for a hierarchical queue structure Resource inheritance and isolation between parent and child queues Resource sharing and reclamation across hierarchical queues
Intelligent resource scheduling	Resource borrowing: One queue can use idle resources of the other queue. Resource reclamation: Excess resources (the resources exceeding the deserved value) are reclaimed preferentially when resources are insufficient. Resource preemption: The resource requirements of high-priority jobs are first met.
Multi-tenant isolation	Strict resource quota control Priority-based resource allocation No single tenant can occupy too many resources.

Prerequisites

A CCE standard or Turbo cluster of v1.19 or later is available. For details about how to create a cluster, see Buying a CCE Standard/Turbo Cluster.
The Volcano Scheduler add-on has been installed. For details, see Volcano Scheduler.

Notes and Constraints

This feature is in the OBT phase. You can experience it. However, the stability has not been fully verified, and the CCE SLA does not apply.

Enabling Queue Resource Management and Scheduling

You can enable the capacity plugin and reclaim action (Volcano Scheduler > Expert mode) to manage and schedule queue resources. The capacity plugin supports only peer queues. You can use hierarchical queues for fine-grained multi-tenant resource allocation.

Log in to the CCE console and click the cluster name to access the cluster console.
In the navigation pane, choose Settings. Then click the Scheduling tab.
In Volcano Scheduler configuration, queue resource management is disabled by default. You need to modify the configuration to enable this function.
1. In Default Cluster Scheduler > Expert mode, click Try Now.
  Figure 1 Expert mode > Try Now
2. Add the following parameters to the YAML file to enable the capacity plugin. You also need to enable the reclaim action for resource reclamation between queues. When queue resources are insufficient, resource reclamation is triggered. The system preferentially reclaims resources that exceed the deserved value of the queue and selects an appropriate reclamation object based on the queue/job priority.
  
  The capacity plugin and proportion plugin conflict with each other. Ensure that the proportion plugin configuration has been removed when using the capacity plugin.
```
...
default_scheduler_conf:
  actions: allocate, backfill, preempt, reclaim     # Enable the reclaim action.
  metrics:
    interval: 30s
    type: ''
  tiers:
    - plugins:
        - name: priority
        - enableJobStarving: false
          enablePreemptable: false
          name: gang
        - name: conformance
    - plugins:
        - enablePreemptable: false
          name: drf
        - name: predicates
        - name: capacity             # Enable the capacity plugin.
        - name: nodeorder
        - arguments:
            binpack.cpu: 1
            binpack.memory: 1
            binpack.resources: nvidia.com/gpu
            binpack.resources.nvidia.com/gpu: 2
            binpack.weight: 10
          name: binpack
```
3. Click Save in the lower right corner.
Click Confirm Settings in the lower right corner. In the displayed dialog box, confirm the modification and click Save.

Create a queue.

vim queue.yaml

The file content is as follows (the capacity plugin allows you to explicitly configure the deserved value to specify the queue resource quota):

apiVersion: scheduling.volcano.sh/v1beta1 
kind: Queue 
metadata: 
  name: capacity-queue 
spec: 
  capability: 
    cpu: "20" 
    memory: "40Gi"
  deserved: 
    cpu: "10" 
    memory: "20Gi"

For more parameter information, see Queue | Volcano.

**Table 2** Queue parameters
Parameter	Example Value	Description
capability	cpu: "20" memory: "40Gi"	(Optional) Specifies the upper limit of resources for the queue. If this parameter is not set, the capability value of the queue is automatically set to realCapability, which is the total resource value of the cluster minus the sum of the guarantee values of other queues.
deserved	cpu: "10" memory: "20Gi"	Specifies the resources that should be allocated to the queue. The value must be less than or equal to the capability value of the queue. When cluster resources are insufficient, the excess resources are reclaimed first. If the resources allocated to the queue exceed the deserved value, the queue cannot reclaim resources from other queues. NOTE: When cluster auto scaling components such as Cluster Autoscaler or Karpenter are used, the total cluster resources dynamically change. In this case, if the capacity plugin is used, you need to manually adjust the value of deserved for the queue to adapt to the resource changes.

Use Case

The following uses a typical queue resource management scenario to describe the queue resource reclamation mechanism.

Assume that there are four available CPU cores in the cluster. In the initial state, the cluster has a default queue. This queue can use all resources of the cluster, and all resources used by the queue can be reclaimed. Two jobs are created in the default queue to occupy all CPU cores of the cluster, and then a new queue is created and a new job is submitted. However, no resources are available for the job. Volcano reclaims the resources that exceed the deserved value for the default queue. The process is as follows:

Create two Volcano jobs (job1 and job2) in the default queue and apply for 1 CPU core and 3 CPU cores for the jobs, respectively.

Create two YAML files, one for job1 and the other for job2.

vim vcjob-a.yaml

The file content is as follows (1 CPU core for job1 and 3 CPU cores for job2):

# job1
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job1
spec:
  queue: default        #The default queue
  tasks:
    - replicas: 1
      template:
        spec:
          containers:
            - image: alpine
              name: alpine
              resources:
                requests:
                  cpu: "1"
---
# job2
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job2
spec:
  queue: default
  tasks:
    - replicas: 1
      template:
        spec:
          containers:
            - image: alpine
              name: alpine
              resources:
                requests:
                  cpu: "3"

Create two jobs in the default queue.

kubectl apply -f vcjob-a.yaml

Information similar to the following is displayed:

job.batch.volcano.sh/job1 created
job.batch.volcano.sh/job2 created

Check the pod statuses.

kubectl get pod

If the following information is displayed and the status of each pod is Running, the cluster CPU resources are used up.

NAME           READY        STATUS        RESTARTS       AGE
job1-test-0    1/1          Running       0              3h21m
job2-test-0    1/1          Running       0              3h31m

Create the test queue.

Create a YAML file for the test queue.

vim new_queue.yaml

The file content is as follows:

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test
spec:
  reclaimable: true
  deserved:
    cpu: 3

Create the test queue.
```
kubectl apply -f new_queue.yaml
```
Information similar to the following is displayed:
```
queue.scheduling.volcano.sh/test created
```

Create a Volcano job (job3) in the test queue and apply for 3 CPU cores (not exceeding the deserved value) to trigger resource reclamation.
1. Create a YAML file for job3.
```
vim vcjob-b.yaml
```
  The file content is as follows (3 CPU cores for job3):
```
# job3
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job3
spec:
  queue: test
  tasks:
    - replicas: 1
      template:
        spec:
          containers:
            - image: alpine
              name: alpine
              resources:
                requests:
                  cpu: "3"
```
2. Create a job in the test queue.
```
kubectl apply -f vcjob-b.yaml
```
  Information similar to the following is displayed:
```
job.batch.volcano.sh/job3 created
```
  Resource reclamation is triggered because the cluster CPU resources are used up.
  - The system reclaims the resources that are not deserved by the default queue. job2 (3 CPU cores) is evicted because the resource of job1 (1 CPU core) is insufficient for job3 to run. job1 (1 CPU core) is retained.
  - job3 (3 CPU cores) starts to run. After job3 is complete, the evicted job is executed again.

Check the pod statuses.

kubectl get pod

If the following information is displayed, the system is reclaiming resources:

NAME           READY        STATUS            RESTARTS       AGE
job1-test-0    1/1          Running           0              3h25m
job2-test-0    1/1          Terminating       0              3h25m
job3-test-0    0/1          Pending           0              1m

Wait for several minutes and run the preceding command again to check the pod statuses. If the following information is displayed, the system has successfully reclaimed resources for job3. After job3 is complete and resources are released, the pod whose resources have been reclaimed will run again.

NAME           READY        STATUS            RESTARTS       AGE
job1-test-0    1/1          Running           0              3h27m
job2-test-0    0/1          Pending          0              3h27m
job3-test-0    1/1          Running           0              3m

Check the pod statuses again to verify whether job2 is running again.

kubectl get pod

If the following information is displayed, the pod whose resources have been reclaimed is running again.

NAME           READY        STATUS            RESTARTS       AGE
job1-test-0    0/1          Completed           0              3h37m
job2-test-0    1/1          Running             1              3h37m
job3-test-0    0/1          Completed           0              13m