Hierarchical Queues
In multi-tenant scenarios, queues are a core mechanism for fair scheduling, resource isolation, and job priority control. In real-world applications, different queues usually belong to different departments, and there are hierarchical relationships between departments, which lead to more refined requirements for resource allocation and preemption. However, traditional peer queues cannot meet such requirements. To address this issue, Volcano Scheduler introduces hierarchical queues to implement resource allocation, sharing, and preemption between queues at different levels. With hierarchical queues, you can manage resource quotas at a finer granularity and build a more efficient unified scheduling platform.
Introduction to Hierarchical Queues
Hierarchical queues are used to implement hierarchical resource allocation and isolation in a multi-tenant cluster environment. Queues are organized in a tree structure, with the following functions:
- The queue hierarchy can be configured. The parent attribute is added to QueueSpec of Volcano Scheduler. When creating a queue, you can use the parent attribute to specify the parent queue that a queue belongs to.
type QueueSpec struct { ... // Specify the parent queue that a queue belongs to. Parent string `json:"parent,omitempty" protobuf:"bytes,8,opt,name=parent"` ... }
After Volcano Scheduler is started, a root queue is created by default. You can create a hierarchical queue tree based on the root queue.
- You can set capability (maximum resources for a queue, deserved (if the allocated resources of a queue exceed the value of deserved, the excess resources may be reclaimed), and guarantee (resources reserved for a queue, which cannot be shared with other queues) for resources in each dimension.
- Resources can be shared and reclaimed across hierarchical queues. If the cluster resources are insufficient for pod deployment, pod resources of queues at other levels can be reclaimed. The rules for reclaiming resources across hierarchical queues are as follows:
- If the allocated resources of a sibling queue exceed the deserved value, pod resources of the sibling queue are reclaimed first.
- If the resources in the sibling queue are insufficient to meet the requirements of the pod, the hierarchical structure of the queues (for example, ancestor queues) will be traversed upward to find sufficient resources.
In Figure 1, Job A and Job C are submitted first, and both the allocated resources of the queues exceed the deserved value. If the cluster resources are insufficient for Job B, the system preferentially reclaims resources from Job A. If the resources are still insufficient after resources from Job A are reclaimed, the system reclaims resources from Job C.
Prerequisites
- A CCE standard or Turbo cluster of v1.27 or later is available. For details about how to create a cluster, see Buying a CCE Standard/Turbo Cluster.
- The Volcano Scheduler add-on of v1.17.1 or later has been installed. For details, see Volcano Scheduler.
Notes and Constraints
This feature is in the OBT phase. You can experience it. However, the stability has not been fully verified, and the CCE SLA does not apply.
Configuring a Hierarchical Queue Policy
After configuring a hierarchical queue policy, you can specify the hierarchical relationships between queues for sharing and reclaiming resources across queues and managing resource quotas at a finer granularity.
- Log in to the CCE console and click the cluster name to access the cluster console.
- In the navigation pane, choose Settings. Then click the Scheduling tab.
- In Volcano Scheduler configuration, hierarchical queues are disabled by default. You need to modify the parameters to enable this feature.
- In Default Cluster Scheduler > Expert mode, click Try Now.
Figure 2 Expert mode > Try Now
- Enable the capacity plugin and set enableHierarchy to true. The hierarchical queue capability relies on the capacity plugin. You also need to enable the reclaim action for resource reclamation between queues. When queue resources are insufficient, resource reclamation is triggered. The system preferentially reclaims resources that exceed the deserved value of the queue and selects an appropriate reclamation object based on the queue/job priority.
The capacity plugin and proportion plugin conflict with each other. Ensure that the proportion plugin configuration has been removed when using the capacity plugin.
Add the following parameters to the YAML file:... default_scheduler_conf: actions: allocate, backfill, preempt, reclaim # Enable the reclaim action. metrics: interval: 30s type: '' tiers: - plugins: - name: priority - enableJobStarving: false enablePreemptable: false name: gang - name: conformance - plugins: - enablePreemptable: false name: drf - name: predicates - name: capacity # Enable the capacity plugin. enableHierarchy: true # Enable hierarchical queues. - name: nodeorder - arguments: binpack.cpu: 1 binpack.memory: 1 binpack.resources: nvidia.com/gpu binpack.resources.nvidia.com/gpu: 2 binpack.weight: 10 name: binpack
- Click Save in the lower right corner.
- In Default Cluster Scheduler > Expert mode, click Try Now.
- Click Confirm Settings in the lower right corner. In the displayed dialog box, confirm the modification and click Save.
Use Case
Assume that there are 8 CPU cores and 16-GiB memory available for a cluster. First, create a hierarchical queue tree. Second, create two Volcano jobs (job-a and job-c) to exhaust cluster resources. Finally, create a Volcano job (job-b) and check the resource reclamation in hierarchical queues. Figure 3 shows the overall structure of this example.
- Create a YAML file for the hierarchical queue tree.
vim hierarchical_queue.yaml
The file content is as follows:
# The parent queue of child-queue-a is the root queue. apiVersion: scheduling.volcano.sh/v1beta1 kind: Queue metadata: name: child-queue-a spec: reclaimable: true parent: root capability: cpu: 5 memory: 10Gi deserved: cpu: 4 memory: 8Gi --- # The parent queue of child-queue-b is the root queue. apiVersion: scheduling.volcano.sh/v1beta1 kind: Queue metadata: name: child-queue-b spec: reclaimable: true parent: root deserved: cpu: 4 memory: 8Gi --- # The parent queue of subchild-queue-a1 is child-queue-a. apiVersion: scheduling.volcano.sh/v1beta1 kind: Queue metadata: name: subchild-queue-a1 spec: reclaimable: true parent: child-queue-a # Set deserved as required. If the allocated resources of a queue exceed the value of deserved, resources used by the queue may be reclaimed. deserved: cpu: 2 memory: 4Gi --- # The parent queue of subchild-queue-a2 is child-queue-a. apiVersion: scheduling.volcano.sh/v1beta1 kind: Queue metadata: name: subchild-queue-a2 spec: reclaimable: true parent: child-queue-a # Set deserved as required. If the allocated resources of a queue exceed the value of deserved, resources used by the queue may be reclaimed. deserved: cpu: 2 memory: 4Gi
The following uses the YAML file of child-queue-a as an example to describe the parameters of hierarchical queues. For more parameter information, see Queue | Volcano.
Table 1 Hierarchical queue parameters Parameter
Example Value
Description
reclaimable
true
(Optional) Specifies whether to enable the resource reclamation policy.
- true (default): If the resource usage of a queue exceeds the value of deserved, other queues can reclaim the resources that are overused by the queue.
- false: Other queues cannot reclaim the resources that are overused by the queue.
parent
root
(Optional) Specifies the parent queue. The queues are hierarchical, and the total resources of a child queue are limited by the parent queue. If parent is not specified, the parent queue is the root queue by default.
capability
cpu: 5
memory: 10Gi
(Optional) Specifies the upper limit of resources for the queue. The value cannot exceed the capability value of the parent queue.
If the capability value of a resource is not set for a queue, the capability value of the resource inherits the setting of its parent queue. If the parent queue and all its ancestor queues are not set, the settings of the root queue are inherited. By default, the capability value of the root queue is set to the total available resource in the cluster.
deserved
cpu: 4
memory: 8Gi
Specifies the resources that should be obtained by a queue. The total deserved values of child queues cannot exceed the deserved value configured for the parent queue, and the deserved value of a queue must be less than or equal to the capability value. The default deserved value of the root queue is the same as its capability value.
If the resources allocated to the queue exceed the deserved value, the queue cannot reclaim resources from other queues.
- Create a hierarchical queue tree.
kubectl apply -f hierarchical_queue.yaml
Information similar to the following is displayed:
queue.scheduling.volcano.sh/child-queue-a created queue.scheduling.volcano.sh/child-queue-b created queue.scheduling.volcano.sh/subchild-queue-a1 created queue.scheduling.volcano.sh/subchild-queue-a2 created
- Create YAML files for Volcano jobs (job-a and job-b). job-a is submitted to subchild-queue-a1, and job-b to child-queue-b.
vim vcjob.yaml
The file content is as follows:
# Submit job-a to the leaf queue subchild-queue-a1. apiVersion: batch.volcano.sh/v1alpha1 kind: Job metadata: name: job-a spec: queue: subchild-queue-a1 schedulerName: volcano minAvailable: 1 tasks: - replicas: 3 name: test template: spec: containers: - image: alpine command: ["/bin/sh", "-c", "sleep 1000"] imagePullPolicy: IfNotPresent name: alpine resources: requests: cpu: "1" memory: 2Gi --- # Submit job-c to the leaf queue child-queue-b. apiVersion: batch.volcano.sh/v1alpha1 kind: Job metadata: name: job-c spec: queue: child-queue-b schedulerName: volcano minAvailable: 1 tasks: - replicas: 5 name: test template: spec: containers: - image: alpine command: ["/bin/sh", "-c", "sleep 1000"] imagePullPolicy: IfNotPresent name: alpine resources: requests: cpu: "1" memory: 2Gi
- Create job-a and job-c.
kubectl apply -f vcjob.yaml
Information similar to the following is displayed:
job.batch.volcano.sh/job-a created job.batch.volcano.sh/job-c created
- Check the pod statuses.
kubectl get pod
If the following information is displayed and the status of each pod is Running, the cluster CPU and memory are used up.
NAME READY STATUS RESTARTS AGE job-a-test-0 1/1 Running 0 3h21m job-a-test-1 1/1 Running 0 3h31m job-a-test-2 1/1 Running 0 3h31m job-c-test-0 1/1 Running 0 24m job-c-test-1 1/1 Running 0 24m job-c-test-2 1/1 Running 0 24m job-c-test-3 1/1 Running 0 24m job-c-test-4 1/1 Running 0 24m
- Create a YAML file for job-b.
vim vcjob1.yaml
The file content is as follows:
# Submit job-b to the leaf queue subchild-queue-a2. apiVersion: batch.volcano.sh/v1alpha1 kind: Job metadata: name: job-b spec: queue: subchild-queue-a1 schedulerName: volcano minAvailable: 1 tasks: - replicas: 2 name: test template: spec: containers: - image: alpine command: ["/bin/sh", "-c", "sleep 1000"] imagePullPolicy: IfNotPresent name: alpine resources: requests: cpu: "1" memory: 2Gi
- Create job-b.
kubectl apply -f vcjob1.yaml
Information similar to the following is displayed:
job.batch.volcano.sh/job-b created
Resource reclamation is triggered because the cluster CPU and memory are used up.
- job-b first checks job-a in the sibling queue. The resources (3 CPU cores and 6-GiB memory) occupied by job-a exceed the deserved resources (2 CPU cores and 4-GiB memory) of subchild-queue-a1. The over-occupied resources (1 CPU core and 2-GiB memory) can be preferentially reclaimed, but the reclaimed resources still cannot meet the requirements of job-b.
- job-b then searches for resources in the upper-level queue and finally finds job-c in child-queue-b for resource reclamation.
- Check the pod statuses and verify that resources have been reclaimed.
kubectl get pod
If the following information is displayed, the system is reclaiming resources:NAME READY STATUS RESTARTS AGE job-a-test-0 1/1 Running 0 3h33m job-a-test-1 1/1 Running 0 3h33m job-a-test-2 1/1 Terminating 0 3h33m job-b-test-0 0/1 Pending 0 1m job-b-test-1 0/1 Pending 0 1m job-c-test-0 1/1 Running 0 26m job-c-test-1 1/1 Running 0 26m job-c-test-2 1/1 Running 0 26m job-c-test-3 1/1 Running 0 26m job-c-test-4 1/1 Terminating 0 26m
Wait for several minutes and run the preceding command again to check the pod statuses. If the following information is displayed, job-b has been executed. After job-b is complete and resources are released, the pod whose resources have been reclaimed will run again.
NAME READY STATUS RESTARTS AGE job-a-test-0 1/1 Running 0 3h35m job-a-test-1 1/1 Running 0 3h35m job-a-test-2 0/1 Pending 0 3h35m job-b-test-0 1/1 Running 0 2m job-b-test-1 1/1 Running 0 2m job-c-test-0 1/1 Running 0 28m job-c-test-1 1/1 Running 0 28m job-c-test-2 1/1 Running 0 28m job-c-test-3 1/1 Running 0 28m job-c-test-4 0/1 Pending 0 28m
- Check the pod statuses again and check whether job-a-test-2 and job-c-test-4 are re-executed.
kubectl get pod
If the following information is displayed, the pod whose resources have been reclaimed is running again.
NAME READY STATUS RESTARTS AGE job-a-test-0 1/1 Running 0 3h48m job-a-test-1 1/1 Running 0 3h48m job-a-test-2 1/1 Running 1 3h48m job-b-test-0 0/1 Completed 0 15m job-b-test-1 0/1 Completed 0 15m job-c-test-0 1/1 Running 0 40m job-c-test-1 1/1 Running 0 30m job-c-test-2 1/1 Running 0 40m job-c-test-3 1/1 Running 0 40m job-c-test-4 1/1 Running 1 40m
Reference
- For more information about queues, see Queue Resource Management (capacity Plugin).
- For more information about Volcano scheduling, see Volcano Scheduling Overview.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot