Help Center/ Cloud Container Engine/ User Guide/ Scheduling/ GPU Scheduling/ GPU Virtualization/ Virtual GPU Burst Scheduling

Updated on 2026-06-16 GMT+08:00

Virtual GPU Burst Scheduling

Burst scheduling is an elastic GPU resource scheduling policy. It ensures that each pod has a guaranteed share of compute power while dynamically preempting idle resources from nodes to maximize overall utilization. This policy dynamically optimizes the allocation of compute resources (such as vCPUs and GPU cores) while retaining the existing memory scheduling and isolation architecture for compute and GPU memory.

Figure 1 Scheduling policies
Click to enlarge

GPU virtualization allocates time slices to each GPU based on the number of containers requesting GPU resources. These time slices, labeled as segment 1, segment 2, ..., segment N, are used to distribute GPU compute power among the containers. As shown in Figure 1, the upper part shows the isolated scheduling policy for both compute and GPU memory resources, while the lower part shows the burst scheduling policy. Assume that containers 1, 2, and 3 request 5%, 5%, and 10% of the compute power, respectively. Container 3 does not use GPU computing. The comparison focuses only on compute scheduling.

Isolated scheduling: CCE allocates the requested compute power to each container: 5% to container 1, 5% to container 2, and 10% to container 3.
Burst scheduling: When container 3 is idle, the unused compute power is dynamically reallocated to containers 1 and 2, resulting in 50% each for containers 1 and 2, and 0% for container 3. When container 3 uses GPU resources, CCE reallocates compute power based on the initial ratio, resulting in 25% for container 1, 25% for container 2, and 50% for container 3.

Prerequisites

A CCE standard or Turbo cluster of v1.23.8-r0, v1.25.3-r0, or later is available.
GPU nodes with cluster-wide virtualization enabled are available in the cluster. For details, see Preparing Virtual GPU Resources.
The CCE AI Suite (NVIDIA GPU) add-on of v2.1.50, v2.7.67, or later has been installed in the cluster. For details, see CCE AI Suite (NVIDIA GPU).
Volcano of v1.10.5 or later has been installed. For details, see Volcano Scheduler.

Notes and Constraints

Before burst scheduling is enabled, all GPU and GPU virtualization tasks in the cluster must be migrated or stopped to prevent service interruptions or scheduling failures.
After burst scheduling is enabled:
- Only compute power supports burst scheduling. GPU memory allocation remains based on the configured quota.
- Containers supporting isolated scheduling for both compute and GPU memory resources (policy=1) can no longer be created in the cluster.
- Containers supporting isolated scheduling for GPU memory resources only (policy=0) can still be created, but they cannot be scheduled on the same GPU as burst containers.
- Containers supporting GPU sharing can still be created, but they cannot be scheduled on the same GPU as burst containers.

Enabling Virtual GPU Burst Scheduling

Log in to the CCE console and click the cluster name to access the cluster console. The Overview page is displayed.
In the navigation pane, choose Add-ons. In the right pane, find the CCE AI Suite (NVIDIA GPU) add-on and click Edit.

In the Install Add-on dialog box, click Edit YAML. Search for enabled_xgpu_burst in the YAML file and set it to true. The following is an example:

...
custom:
      annotations: {}
      compatible_with_legacy_api: false
      component_schedulername: kube-scheduler
      disable_mount_path_v1: false
      disable_nvidia_gsp: true
      driver_mount_paths: bin,lib64
      enable_fault_isolation: true
      enable_health_monitoring: true
      enable_metrics_monitoring: true
      enable_simple_lib64_mount: true
      enable_xgpu: false
      enabled_xgpu_burst: true
      gpu_driver_config: {}
      health_check_xids_v2: 74,79
      inject_ld_Library_path: ''
      install_nvidia_peermem: false
      is_driver_from_nvidia: true
...

After the setting, click OK in the lower right corner of the page. CCE AI Suite (NVIDIA GPU) is then automatically upgraded. After the add-on status changes to Running, the burst function becomes effective.

Using Burst Scheduling

Install kubectl on an existing ECS and access a cluster using kubectl. For details, see Accessing a Cluster Using kubectl.

Run the following command to create a YAML file, which is used for virtual GPU burst scheduling container creation:

vim xgpu-burst.yaml

Example file content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: xgpu-burst
  labels:
    app: xgpu-burst
spec:
  replicas: 1
  selector:
    matchLabels:
      app: xgpu-burst
      xgpu.burst/enabled: true   # Enable the burst function.
  template: 
    metadata:
      labels:
        app: xgpu-burst
    spec:
      containers:
      - name: container-1
        image: <your_image_address>     # Replace it with your image address.
        resources:
          limits:
            volcano.sh/gpu-mem.128Mi: 40  # The GPU memory allocated to the pod. This value represents 5120 MiB (40 x 128 MiB).
            volcano.sh/gpu-core.percentage: 25    # The compute power allocated to the pod, in percentage
      imagePullSecrets:
        - name: default-secret
      schedulerName: volcano

After the burst function is enabled, containers supporting isolated scheduling for both compute and GPU memory resources cannot be created. If the volcano.sh/gpu-mem.128Mi and volcano.sh/gpu-core.percentage parameters are specified in resources.requests and resources.limits, the xgpu.burst/enabled: true label must be set. Otherwise, the workload will not be scheduled.

Run the following command to create the workload:
```
kubectl apply -f xgpu-burst.yaml
```
If information similar to the following is displayed, the workload has been created:
```
deployment.apps/xgpu-burst created
```

Run the following command to view the created pod:

kubectl get pod -n default

Information similar to the following is displayed:

NAME                         READY   STATUS    RESTARTS   AGE
xgpu-burst-6bdb4d7cb-pmtc2   1/1     Running   0          21s

Log in to the pod and check the scheduling policy used by it.
```
kubectl exec -it gpu-app-6bdb4d7cb-pmtc2 -- cat /proc/xgpu/0/policy
```
If 6 is displayed in the command output, the burst scheduling policy is used.

Disabling Virtual GPU Burst Scheduling

Before disabling virtual GPU burst scheduling, migrate or stop containers using this capability in the cluster. Otherwise, virtual GPU containers will not be scheduled to GPU nodes running burst containers, blocking task scheduling and wasting resources.

Log in to the CCE console and click the cluster name to access the cluster console. The Overview page is displayed.
In the navigation pane, choose Add-ons. In the right pane, find the CCE AI Suite (NVIDIA GPU) add-on and click Edit.

In the Install Add-on dialog box, click Edit YAML. Search for enabled_xgpu_burst in the YAML file and set it to false. The following is an example:

...
custom:
      annotations: {}
      compatible_with_legacy_api: false
      component_schedulername: kube-scheduler
      disable_mount_path_v1: false
      disable_nvidia_gsp: true
      driver_mount_paths: bin,lib64
      enable_fault_isolation: true
      enable_health_monitoring: true
      enable_metrics_monitoring: true
      enable_simple_lib64_mount: true
      enable_xgpu: false
      enabled_xgpu_burst: false
      gpu_driver_config: {}
      health_check_xids_v2: 74,79
      inject_ld_Library_path: ''
      install_nvidia_peermem: false
      is_driver_from_nvidia: true
...

After the setting, click OK in the lower right corner of the page. CCE AI Suite (NVIDIA GPU) is then automatically upgraded. After the add-on status changes to Running, the burst function becomes ineffective.

Use Cases for Virtual GPU Burst Scheduling

Assume that there is a GPU node in your cluster. You create a workload with a burst scheduling policy, deploying it on a single pod with one container requesting 20% of the GPU compute power. Initially, the entire GPU power is allocated to this container. Next, you create another workload with the same burst scheduling policy, deploying it on a separate pod with one container requesting 5% of the GPU power. The GPU node dynamically reallocates compute power based on the request ratios of the two containers, 80% to the first container and 20% to the second.

Install kubectl on an existing ECS and access a cluster using kubectl. For details, see Accessing a Cluster Using kubectl.

Run the following command to create a YAML file, which is used for the creation of the first burst scheduling container:

vim xgpu-burst1.yaml

Example file content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: xgpu-burst1
  labels:
    app: xgpu-burst1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: xgpu-burst1
      xgpu.burst/enabled: true   # Enable the burst function.
  template: 
    metadata:
      labels:
        app: xgpu-burst1
    spec:
      containers:
      - name: container-1     
        image: nginx:latest     # Replace it with your image address.
        command: 
          <dosomething>    # Replace it with the actual GPU service command.
        resources:
          limits:
            volcano.sh/gpu-mem.128Mi: 40  # The GPU memory allocated to the pod. This value represents 5120 MiB (40 x 128 MiB).
            volcano.sh/gpu-core.percentage: 20    # The compute power allocated to the pod, in percentage
      imagePullSecrets:
        - name: default-secret
      schedulerName: volcano

Run the following command to create the workload:
```
kubectl apply -f xgpu-burst1.yaml
```
If information similar to the following is displayed, the workload has been created:
```
deployment.apps/xgpu-burst1 created
```

Run the following command to view the created pod:

kubectl get pod -n default

If information similar to the following is displayed, the pod has been executed:

NAME                          READY   STATUS    RESTARTS   AGE
xgpu-burst1-6bdb4d7cb-pmtc2   1/1     Running   0          21s

xgpu-smi

The entire GPU power is allocated to the single container.

Fri Mar  7 03:36:03 2025
+---------------------------------------------------------------------------------------+
| HUAWEI CLOUD XGPU-SMI                                           XGPU Version: 1.0     |
|=========================================+======================+======================|
|    Container-Id    |    GPU    |    GPU-Util/Limit    |     GPU-Memory-Usage/Limit    |
+-----------------------------------------+----------------------+----------------------+
|       5eff70afff85 |         0 |          100% / 20%  |             1028Mi / 5120Mi   |
|=========================================+======================+======================|
...

Run the following command to create a YAML file, which is used for the creation of the second burst scheduling container:

vim xgpu-burst2.yaml

Example file content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: xgpu-burst2
  labels:
    app: xgpu-burst2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: xgpu-burst2
      xgpu.burst/enabled: true   # Enable the burst function.
  template: 
    metadata:
      labels:
        app: xgpu-burst2
    spec:
      containers:
      - name: container-1
        image: nginx:latest     # Replace it with your image address.
        command: 
          <dosomething>    # Replace it with the actual GPU service command.
        resources:
          limits:
            volcano.sh/gpu-mem.128Mi: 40  # The GPU memory allocated to the pod. This value represents 5120 MiB (40 x 128 MiB).
            volcano.sh/gpu-core.percentage: 5    # The compute power allocated to the pod, in percentage
      imagePullSecrets:
        - name: default-secret
      schedulerName: volcano

Run the following command to create the workload:
```
kubectl apply -f xgpu-burst2.yaml
```
If information similar to the following is displayed, the workload has been created:
```
deployment.apps/xgpu-burst2 created
```

Run the following command to view the created pod:

kubectl get pod -n default

If information similar to the following is displayed, the pod has been executed:

NAME                          READY   STATUS    RESTARTS   AGE
xgpu-burst1-6bdb4d7cb-pmtc2   1/1     Running   0          21s
xgpu-burst2-5xdb4d7cb-qmld3   1/1     Running   0          21s

Log in to the GPU node and run the following command to check the compute power allocation for the workload container: Compute power allocation may take some time. Wait patiently.

xgpu-smi

The GPU node dynamically reallocates compute power based on the request ratios of the two containers, 80% to the first container and 20% to the second.

Fri Mar  7 03:36:03 2025
+---------------------------------------------------------------------------------------+
| HUAWEI CLOUD XGPU-SMI                                           XGPU Version: 1.0     |
|=========================================+======================+======================|
|    Container-Id    |    GPU    |    GPU-Util/Limit    |     GPU-Memory-Usage/Limit    |
+-----------------------------------------+----------------------+----------------------+
|       5eff70afff85 |         0 |          80% / 20%  |              1024Mi / 5120Mi   |
|       98d5201b7ea3 |         0 |          20% / 5%    |             2041Mi / 5120Mi   |
|=========================================+======================+======================|
...

Helpful Links

Parent Topic: GPU Virtualization

Previous topic: Priority-based Preemptive Scheduling on Virtual GPUs

Next topic: GPU Monitoring

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot