Updated on 2025-08-28 GMT+08:00

Hierarchical Queues

In multi-tenant scenarios, queues are a core mechanism for fair scheduling, resource isolation, and job priority control. In real-world applications, different queues usually belong to different departments, and there are hierarchical relationships between departments, which lead to more refined requirements for resource allocation and preemption. However, traditional peer queues cannot meet such requirements. To address this issue, Volcano Scheduler introduces hierarchical queues to implement resource allocation, sharing, and preemption between queues at different levels. With hierarchical queues, you can manage resource quotas at a finer granularity and build a more efficient unified scheduling platform.

Introduction to Hierarchical Queues

Hierarchical queues are used to implement hierarchical resource allocation and isolation in a multi-tenant cluster environment. Queues are organized in a tree structure, with the following functions:

  • The queue hierarchy can be configured. The parent attribute is added to QueueSpec of Volcano Scheduler. When creating a queue, you can use the parent attribute to specify the parent queue that a queue belongs to.
    type QueueSpec struct {
        ...
       // Specify the parent queue that a queue belongs to.
        Parent string `json:"parent,omitempty" protobuf:"bytes,8,opt,name=parent"`
        ...
    }

    After Volcano Scheduler is started, a root queue is created by default. You can create a hierarchical queue tree based on the root queue.

  • You can set capability (maximum resources for a queue, deserved (if the allocated resources of a queue exceed the value of deserved, the excess resources may be reclaimed), and guarantee (resources reserved for a queue, which cannot be shared with other queues) for resources in each dimension.
  • Resources can be shared and reclaimed across hierarchical queues. If the cluster resources are insufficient for pod deployment, pod resources of queues at other levels can be reclaimed. The rules for reclaiming resources across hierarchical queues are as follows:
    • If the allocated resources of a sibling queue exceed the deserved value, pod resources of the sibling queue are reclaimed first.
    • If the resources in the sibling queue are insufficient to meet the requirements of the pod, the hierarchical structure of the queues (for example, ancestor queues) will be traversed upward to find sufficient resources.

    In Figure 1, Job A and Job C are submitted first, and both the allocated resources of the queues exceed the deserved value. If the cluster resources are insufficient for Job B, the system preferentially reclaims resources from Job A. If the resources are still insufficient after resources from Job A are reclaimed, the system reclaims resources from Job C.

    Figure 1 Reclaiming resources of hierarchical queues

Prerequisites

Notes and Constraints

This feature is in the OBT phase. You can experience it. However, the stability has not been fully verified, and the CCE SLA does not apply.

Configuring a Hierarchical Queue Policy

After configuring a hierarchical queue policy, you can specify the hierarchical relationships between queues for sharing and reclaiming resources across queues and managing resource quotas at a finer granularity.

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. In the navigation pane, choose Settings. Then click the Scheduling tab.
  3. In Volcano Scheduler configuration, hierarchical queues are disabled by default. You need to modify the parameters to enable this feature.

    1. In Default Cluster Scheduler > Expert mode, click Try Now.
      Figure 2 Expert mode > Try Now

    2. Enable the capacity plugin and set enableHierarchy to true. The hierarchical queue capability relies on the capacity plugin. You also need to enable the reclaim action for resource reclamation between queues. When queue resources are insufficient, resource reclamation is triggered. The system preferentially reclaims resources that exceed the deserved value of the queue and selects an appropriate reclamation object based on the queue/job priority.

      The capacity plugin and proportion plugin conflict with each other. Ensure that the proportion plugin configuration has been removed when using the capacity plugin.

      Add the following parameters to the YAML file:
      ...
      default_scheduler_conf:
        actions: allocate, backfill, preempt, reclaim     # Enable the reclaim action.
        metrics:
          interval: 30s
          type: ''
        tiers:
          - plugins:
              - name: priority
              - enableJobStarving: false
                enablePreemptable: false
                name: gang
              - name: conformance
          - plugins:
              - enablePreemptable: false
                name: drf
              - name: predicates
              - name: capacity             # Enable the capacity plugin.
                enableHierarchy: true     # Enable hierarchical queues.
              - name: nodeorder
              - arguments:
                  binpack.cpu: 1
                  binpack.memory: 1
                  binpack.resources: nvidia.com/gpu
                  binpack.resources.nvidia.com/gpu: 2
                  binpack.weight: 10
                name: binpack
    3. Click Save in the lower right corner.

  4. Click Confirm Settings in the lower right corner. In the displayed dialog box, confirm the modification and click Save.

Use Case

Assume that there are 8 CPU cores and 16-GiB memory available for a cluster. First, create a hierarchical queue tree. Second, create two Volcano jobs (job-a and job-c) to exhaust cluster resources. Finally, create a Volcano job (job-b) and check the resource reclamation in hierarchical queues. Figure 3 shows the overall structure of this example.

Figure 3 Hierarchical queues
  1. Create a YAML file for the hierarchical queue tree.

    vim hierarchical_queue.yaml

    The file content is as follows:

    # The parent queue of child-queue-a is the root queue.
    apiVersion: scheduling.volcano.sh/v1beta1
    kind: Queue
    metadata:
      name: child-queue-a
    spec:
      reclaimable: true
      parent: root 
      capability:
        cpu: 5
        memory: 10Gi
      deserved:
        cpu: 4
        memory: 8Gi
    ---
    # The parent queue of child-queue-b is the root queue.
    apiVersion: scheduling.volcano.sh/v1beta1
    kind: Queue
    metadata:
      name: child-queue-b
    spec:
      reclaimable: true
      parent: root 
      deserved:
        cpu: 4
        memory: 8Gi
    ---
    # The parent queue of subchild-queue-a1 is child-queue-a.
    apiVersion: scheduling.volcano.sh/v1beta1
    kind: Queue
    metadata:
      name: subchild-queue-a1
    spec:
      reclaimable: true
      parent: child-queue-a
      # Set deserved as required. If the allocated resources of a queue exceed the value of deserved, resources used by the queue may be reclaimed.
      deserved: 
        cpu: 2
        memory: 4Gi
    ---
    # The parent queue of subchild-queue-a2 is child-queue-a.
    apiVersion: scheduling.volcano.sh/v1beta1
    kind: Queue
    metadata:
      name: subchild-queue-a2
    spec:
      reclaimable: true
      parent: child-queue-a 
      # Set deserved as required. If the allocated resources of a queue exceed the value of deserved, resources used by the queue may be reclaimed.
      deserved: 
        cpu: 2
        memory: 4Gi

    The following uses the YAML file of child-queue-a as an example to describe the parameters of hierarchical queues. For more parameter information, see Queue | Volcano.

    Table 1 Hierarchical queue parameters

    Parameter

    Example Value

    Description

    reclaimable

    true

    (Optional) Specifies whether to enable the resource reclamation policy.

    • true (default): If the resource usage of a queue exceeds the value of deserved, other queues can reclaim the resources that are overused by the queue.
    • false: Other queues cannot reclaim the resources that are overused by the queue.

    parent

    root

    (Optional) Specifies the parent queue. The queues are hierarchical, and the total resources of a child queue are limited by the parent queue. If parent is not specified, the parent queue is the root queue by default.

    capability

    cpu: 5

    memory: 10Gi

    (Optional) Specifies the upper limit of resources for the queue. The value cannot exceed the capability value of the parent queue.

    If the capability value of a resource is not set for a queue, the capability value of the resource inherits the setting of its parent queue. If the parent queue and all its ancestor queues are not set, the settings of the root queue are inherited. By default, the capability value of the root queue is set to the total available resource in the cluster.

    deserved

    cpu: 4

    memory: 8Gi

    Specifies the resources that should be obtained by a queue. The total deserved values of child queues cannot exceed the deserved value configured for the parent queue, and the deserved value of a queue must be less than or equal to the capability value. The default deserved value of the root queue is the same as its capability value.

    If the resources allocated to the queue exceed the deserved value, the queue cannot reclaim resources from other queues.

  2. Create a hierarchical queue tree.

    kubectl apply -f hierarchical_queue.yaml

    Information similar to the following is displayed:

    queue.scheduling.volcano.sh/child-queue-a created
    queue.scheduling.volcano.sh/child-queue-b created
    queue.scheduling.volcano.sh/subchild-queue-a1 created
    queue.scheduling.volcano.sh/subchild-queue-a2 created

  3. Create YAML files for Volcano jobs (job-a and job-b). job-a is submitted to subchild-queue-a1, and job-b to child-queue-b.

    vim vcjob.yaml

    The file content is as follows:

    # Submit job-a to the leaf queue subchild-queue-a1.
    apiVersion: batch.volcano.sh/v1alpha1
    kind: Job
    metadata:
      name: job-a
    spec:
      queue: subchild-queue-a1
      schedulerName: volcano
      minAvailable: 1
      tasks:
        - replicas: 3
          name: test
          template:
            spec:
              containers:
                - image: alpine
                  command: ["/bin/sh", "-c", "sleep 1000"]
                  imagePullPolicy: IfNotPresent
                  name: alpine
                  resources:
                    requests:
                      cpu: "1"
                      memory: 2Gi
    ---
    # Submit job-c to the leaf queue child-queue-b.
    apiVersion: batch.volcano.sh/v1alpha1
    kind: Job
    metadata:
      name: job-c
    spec:
      queue: child-queue-b
      schedulerName: volcano
      minAvailable: 1
      tasks:
        - replicas: 5
          name: test
          template:
            spec:
              containers:
                - image: alpine
                  command: ["/bin/sh", "-c", "sleep 1000"]
                  imagePullPolicy: IfNotPresent
                  name: alpine
                  resources:
                    requests:
                      cpu: "1"
                      memory: 2Gi

  4. Create job-a and job-c.

    kubectl apply -f vcjob.yaml

    Information similar to the following is displayed:

    job.batch.volcano.sh/job-a created
    job.batch.volcano.sh/job-c created

  5. Check the pod statuses.

    kubectl get pod

    If the following information is displayed and the status of each pod is Running, the cluster CPU and memory are used up.

    NAME           READY        STATUS        RESTARTS       AGE
    job-a-test-0   1/1          Running       0              3h21m
    job-a-test-1   1/1          Running       0              3h31m
    job-a-test-2   1/1          Running       0              3h31m
    job-c-test-0   1/1          Running       0              24m
    job-c-test-1   1/1          Running       0              24m
    job-c-test-2   1/1          Running       0              24m
    job-c-test-3   1/1          Running       0              24m
    job-c-test-4   1/1          Running       0              24m

  6. Create a YAML file for job-b.

    vim vcjob1.yaml

    The file content is as follows:

    # Submit job-b to the leaf queue subchild-queue-a2.
    apiVersion: batch.volcano.sh/v1alpha1
    kind: Job
    metadata:
      name: job-b
    spec:
      queue: subchild-queue-a1
      schedulerName: volcano
      minAvailable: 1
      tasks:
        - replicas: 2
          name: test
          template:
            spec:
              containers:
                - image: alpine
                  command: ["/bin/sh", "-c", "sleep 1000"]
                  imagePullPolicy: IfNotPresent
                  name: alpine
                  resources:
                    requests:
                      cpu: "1"
                      memory: 2Gi

  7. Create job-b.

    kubectl apply -f vcjob1.yaml

    Information similar to the following is displayed:

    job.batch.volcano.sh/job-b created

    Resource reclamation is triggered because the cluster CPU and memory are used up.

    • job-b first checks job-a in the sibling queue. The resources (3 CPU cores and 6-GiB memory) occupied by job-a exceed the deserved resources (2 CPU cores and 4-GiB memory) of subchild-queue-a1. The over-occupied resources (1 CPU core and 2-GiB memory) can be preferentially reclaimed, but the reclaimed resources still cannot meet the requirements of job-b.
    • job-b then searches for resources in the upper-level queue and finally finds job-c in child-queue-b for resource reclamation.

  8. Check the pod statuses and verify that resources have been reclaimed.

    kubectl get pod
    If the following information is displayed, the system is reclaiming resources:
    NAME           READY        STATUS            RESTARTS       AGE
    job-a-test-0   1/1          Running           0              3h33m
    job-a-test-1   1/1          Running           0              3h33m
    job-a-test-2   1/1          Terminating       0              3h33m
    job-b-test-0   0/1          Pending           0              1m
    job-b-test-1   0/1          Pending           0              1m
    job-c-test-0   1/1          Running           0              26m
    job-c-test-1   1/1          Running           0              26m
    job-c-test-2   1/1          Running           0              26m
    job-c-test-3   1/1          Running           0              26m
    job-c-test-4   1/1          Terminating       0              26m

    Wait for several minutes and run the preceding command again to check the pod statuses. If the following information is displayed, job-b has been executed. After job-b is complete and resources are released, the pod whose resources have been reclaimed will run again.

    NAME           READY        STATUS            RESTARTS       AGE
    job-a-test-0   1/1          Running           0              3h35m
    job-a-test-1   1/1          Running           0              3h35m
    job-a-test-2   0/1          Pending           0              3h35m
    job-b-test-0   1/1          Running           0              2m
    job-b-test-1   1/1          Running           0              2m
    job-c-test-0   1/1          Running           0              28m
    job-c-test-1   1/1          Running           0              28m
    job-c-test-2   1/1          Running           0              28m
    job-c-test-3   1/1          Running           0              28m
    job-c-test-4   0/1          Pending           0              28m

  9. Check the pod statuses again and check whether job-a-test-2 and job-c-test-4 are re-executed.

    kubectl get pod

    If the following information is displayed, the pod whose resources have been reclaimed is running again.

    NAME           READY        STATUS            RESTARTS       AGE
    job-a-test-0   1/1          Running           0              3h48m
    job-a-test-1   1/1          Running           0              3h48m
    job-a-test-2   1/1          Running           1              3h48m
    job-b-test-0   0/1          Completed         0              15m
    job-b-test-1   0/1          Completed         0              15m
    job-c-test-0   1/1          Running           0              40m
    job-c-test-1   1/1          Running           0              30m
    job-c-test-2   1/1          Running           0              40m
    job-c-test-3   1/1          Running           0              40m
    job-c-test-4   1/1          Running           1              40m

Reference