Updated on 2026-04-03 GMT+08:00

Virtual GPU Burst Scheduling

Burst scheduling is an elastic GPU resource scheduling policy. It ensures that each pod has a guaranteed share of compute power while dynamically preempting idle resources from nodes to maximize overall utilization. This policy dynamically optimizes the allocation of compute resources (such as vCPUs and GPU cores) while retaining the existing memory scheduling and isolation architecture for compute and GPU memory.

Figure 1 Scheduling policies

GPU virtualization allocates time slices to each GPU based on the number of containers requesting GPU resources. These time slices, labeled as segment 1, segment 2, ..., segment N, are used to distribute GPU compute power among the containers. As shown in Figure 1, the upper part shows the isolated scheduling policy for both compute and GPU memory resources, while the lower part shows the burst scheduling policy. Assume that containers 1, 2, and 3 request 5%, 5%, and 10% of the compute power, respectively. Container 3 does not use GPU computing. The comparison focuses only on compute scheduling.

  • Isolated scheduling: CCE allocates the requested compute power to each container: 5% to container 1, 5% to container 2, and 10% to container 3.
  • Burst scheduling: When container 3 is idle, the unused compute power is dynamically reallocated to containers 1 and 2, resulting in 50% each for containers 1 and 2, and 0% for container 3. When container 3 uses GPU resources, CCE reallocates compute power based on the initial ratio, resulting in 25% for container 1, 25% for container 2, and 50% for container 3.

Prerequisites

  • A CCE standard or Turbo cluster of v1.23.8-r0, v1.25.3-r0, or later is available.
  • GPU nodes with cluster-wide virtualization enabled are available in the cluster. For details, see Preparing Virtual GPU Resources.
  • The CCE AI Suite (NVIDIA GPU) add-on of v2.1.50, v2.7.67, or later has been installed in the cluster. For details, see CCE AI Suite (NVIDIA GPU).
  • Volcano of v1.10.5 or later has been installed. For details, see Volcano Scheduler.

Notes and Constraints

  • Before burst scheduling is enabled, all GPU and GPU virtualization tasks in the cluster must be migrated or stopped to prevent service interruptions or scheduling failures.
  • After burst scheduling is enabled:
    • Only compute power supports burst scheduling. GPU memory allocation remains based on the configured quota.
    • Containers supporting isolated scheduling for both compute and GPU memory resources (policy=1) can no longer be created in the cluster.
    • Containers supporting isolated scheduling for GPU memory resources only (policy=0) can still be created, but they cannot be scheduled on the same GPU as burst containers.
    • Containers supporting GPU sharing can still be created, but they cannot be scheduled on the same GPU as burst containers.

Enabling Virtual GPU Burst Scheduling

  1. Log in to the CCE console and click the cluster name to access the cluster console. The Overview page is displayed.
  2. In the navigation pane, choose Add-ons. In the right pane, find the CCE AI Suite (NVIDIA GPU) add-on and click Edit.
  3. In the Install Add-on dialog box, click Edit YAML. Search for enabled_xgpu_burst in the YAML file and set it to true. The following is an example:

    ...
    custom:
          annotations: {}
          compatible_with_legacy_api: false
          component_schedulername: kube-scheduler
          disable_mount_path_v1: false
          disable_nvidia_gsp: true
          driver_mount_paths: bin,lib64
          enable_fault_isolation: true
          enable_health_monitoring: true
          enable_metrics_monitoring: true
          enable_simple_lib64_mount: true
          enable_xgpu: false
          enabled_xgpu_burst: true
          gpu_driver_config: {}
          health_check_xids_v2: 74,79
          inject_ld_Library_path: ''
          install_nvidia_peermem: false
          is_driver_from_nvidia: true
    ...

  4. After the setting, click OK in the lower right corner of the page. CCE AI Suite (NVIDIA GPU) is then automatically upgraded. After the add-on status changes to Running, the burst function becomes effective.

Using Burst Scheduling

  1. Install kubectl on an existing ECS and access a cluster using kubectl. For details, see Accessing a Cluster Using kubectl.
  2. Run the following command to create a YAML file, which is used for virtual GPU burst scheduling container creation:

    vim xgpu-burst.yaml
    Example file content:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: xgpu-burst
      labels:
        app: xgpu-burst
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: xgpu-burst
          xgpu.burst/enabled: true   # Enable the burst function.
      template: 
        metadata:
          labels:
            app: xgpu-burst
        spec:
          containers:
          - name: container-1
            image: <your_image_address>     # Replace it with your image address.
            resources:
              limits:
                volcano.sh/gpu-mem.128Mi: 40  # The GPU memory allocated to the pod. This value represents 5120 MiB (40 x 128 MiB).
                volcano.sh/gpu-core.percentage: 25    # The compute power allocated to the pod, in percentage
          imagePullSecrets:
            - name: default-secret
          schedulerName: volcano
    • After the burst function is enabled, containers supporting isolated scheduling for both compute and GPU memory resources cannot be created. If the volcano.sh/gpu-mem.128Mi and volcano.sh/gpu-core.percentage parameters are specified in resources.requests and resources.limits, the xgpu.burst/enabled: true label must be set. Otherwise, the workload will not be scheduled.

  3. Run the following command to create the workload:

    kubectl apply -f xgpu-burst.yaml

    If information similar to the following is displayed, the workload has been created:

    deployment.apps/xgpu-burst created

  4. Run the following command to view the created pod:

    kubectl get pod -n default

    Information similar to the following is displayed:

    NAME                         READY   STATUS    RESTARTS   AGE
    xgpu-burst-6bdb4d7cb-pmtc2   1/1     Running   0          21s

  5. Log in to the pod and check the scheduling policy used by it.

    kubectl exec -it gpu-app-6bdb4d7cb-pmtc2 -- cat /proc/xgpu/0/policy

    If 6 is displayed in the command output, the burst scheduling policy is used.

Disabling Virtual GPU Burst Scheduling

Before disabling virtual GPU burst scheduling, migrate or stop containers using this capability in the cluster. Otherwise, virtual GPU containers will not be scheduled to GPU nodes running burst containers, blocking task scheduling and wasting resources.

  1. Log in to the CCE console and click the cluster name to access the cluster console. The Overview page is displayed.
  2. In the navigation pane, choose Add-ons. In the right pane, find the CCE AI Suite (NVIDIA GPU) add-on and click Edit.
  3. In the Install Add-on dialog box, click Edit YAML. Search for enabled_xgpu_burst in the YAML file and set it to false. The following is an example:

    ...
    custom:
          annotations: {}
          compatible_with_legacy_api: false
          component_schedulername: kube-scheduler
          disable_mount_path_v1: false
          disable_nvidia_gsp: true
          driver_mount_paths: bin,lib64
          enable_fault_isolation: true
          enable_health_monitoring: true
          enable_metrics_monitoring: true
          enable_simple_lib64_mount: true
          enable_xgpu: false
          enabled_xgpu_burst: false
          gpu_driver_config: {}
          health_check_xids_v2: 74,79
          inject_ld_Library_path: ''
          install_nvidia_peermem: false
          is_driver_from_nvidia: true
    ...

  4. After the setting, click OK in the lower right corner of the page. CCE AI Suite (NVIDIA GPU) is then automatically upgraded. After the add-on status changes to Running, the burst function becomes ineffective.

Use Cases for Virtual GPU Burst Scheduling

Assume that there is a GPU node in your cluster. You create a workload with a burst scheduling policy, deploying it on a single pod with one container requesting 20% of the GPU compute power. Initially, the entire GPU power is allocated to this container. Next, you create another workload with the same burst scheduling policy, deploying it on a separate pod with one container requesting 5% of the GPU power. The GPU node dynamically reallocates compute power based on the request ratios of the two containers, 80% to the first container and 20% to the second.

  1. Install kubectl on an existing ECS and access a cluster using kubectl. For details, see Accessing a Cluster Using kubectl.
  2. Run the following command to create a YAML file, which is used for the creation of the first burst scheduling container:

    vim xgpu-burst1.yaml
    Example file content:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: xgpu-burst1
      labels:
        app: xgpu-burst1
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: xgpu-burst1
          xgpu.burst/enabled: true   # Enable the burst function.
      template: 
        metadata:
          labels:
            app: xgpu-burst1
        spec:
          containers:
          - name: container-1     
            image: nginx:latest     # Replace it with your image address.
            command: 
              <dosomething>    # Replace it with the actual GPU service command.
            resources:
              limits:
                volcano.sh/gpu-mem.128Mi: 40  # The GPU memory allocated to the pod. This value represents 5120 MiB (40 x 128 MiB).
                volcano.sh/gpu-core.percentage: 20    # The compute power allocated to the pod, in percentage
          imagePullSecrets:
            - name: default-secret
          schedulerName: volcano

  3. Run the following command to create the workload:

    kubectl apply -f xgpu-burst1.yaml

    If information similar to the following is displayed, the workload has been created:

    deployment.apps/xgpu-burst1 created

  4. Run the following command to view the created pod:

    kubectl get pod -n default

    If information similar to the following is displayed, the pod has been executed:

    NAME                          READY   STATUS    RESTARTS   AGE
    xgpu-burst1-6bdb4d7cb-pmtc2   1/1     Running   0          21s

  5. Log in to the GPU node and run the following command to check the compute power allocation for the workload container:

    xgpu-smi
    The entire GPU power is allocated to the single container.
    Fri Mar  7 03:36:03 2025
    +---------------------------------------------------------------------------------------+
    | HUAWEI CLOUD XGPU-SMI                                           XGPU Version: 1.0     |
    |=========================================+======================+======================|
    |    Container-Id    |    GPU    |    GPU-Util/Limit    |     GPU-Memory-Usage/Limit    |
    +-----------------------------------------+----------------------+----------------------+
    |       5eff70afff85 |         0 |          100% / 20%  |             1028Mi / 5120Mi   |
    |=========================================+======================+======================|
    ...

  6. Run the following command to create a YAML file, which is used for the creation of the second burst scheduling container:

    vim xgpu-burst2.yaml
    Example file content:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: xgpu-burst2
      labels:
        app: xgpu-burst2
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: xgpu-burst2
          xgpu.burst/enabled: true   # Enable the burst function.
      template: 
        metadata:
          labels:
            app: xgpu-burst2
        spec:
          containers:
          - name: container-1
            image: nginx:latest     # Replace it with your image address.
            command: 
              <dosomething>    # Replace it with the actual GPU service command.
            resources:
              limits:
                volcano.sh/gpu-mem.128Mi: 40  # The GPU memory allocated to the pod. This value represents 5120 MiB (40 x 128 MiB).
                volcano.sh/gpu-core.percentage: 5    # The compute power allocated to the pod, in percentage
          imagePullSecrets:
            - name: default-secret
          schedulerName: volcano

  7. Run the following command to create the workload:

    kubectl apply -f xgpu-burst2.yaml

    If information similar to the following is displayed, the workload has been created:

    deployment.apps/xgpu-burst2 created

  8. Run the following command to view the created pod:

    kubectl get pod -n default

    If information similar to the following is displayed, the pod has been executed:

    NAME                          READY   STATUS    RESTARTS   AGE
    xgpu-burst1-6bdb4d7cb-pmtc2   1/1     Running   0          21s
    xgpu-burst2-5xdb4d7cb-qmld3   1/1     Running   0          21s

  9. Log in to the GPU node and run the following command to check the compute power allocation for the workload container: Compute power allocation may take some time. Wait patiently.

    xgpu-smi
    The GPU node dynamically reallocates compute power based on the request ratios of the two containers, 80% to the first container and 20% to the second.
    Fri Mar  7 03:36:03 2025
    +---------------------------------------------------------------------------------------+
    | HUAWEI CLOUD XGPU-SMI                                           XGPU Version: 1.0     |
    |=========================================+======================+======================|
    |    Container-Id    |    GPU    |    GPU-Util/Limit    |     GPU-Memory-Usage/Limit    |
    +-----------------------------------------+----------------------+----------------------+
    |       5eff70afff85 |         0 |          80% / 20%  |              1024Mi / 5120Mi   |
    |       98d5201b7ea3 |         0 |          20% / 5%    |             2041Mi / 5120Mi   |
    |=========================================+======================+======================|
    ...