Help Center/ Cloud Container Engine/ User Guide/ Scheduling/ GPU Scheduling/ Default GPU Scheduling in Kubernetes

Updated on 2026-06-16 GMT+08:00

Default GPU Scheduling in Kubernetes

CCE standard and Turbo clusters support Kubernetes' default GPU scheduling mode. This mode uses a device plugin to manage GPUs as a standard resource type. After the CCE AI Suite (NVIDIA GPU) add-on is installed on a node, CCE automatically detects the number of GPUs on the node and allocates them to pods based on the resources.limits specified during scheduling. nvidia.com/gpu can be set to an integer or a decimal.

If nvidia.com/gpu is set to a positive integer, for example, 1, an entire physical GPU will be exclusively allocated to the target pod. This is ideal for scenarios requiring high performance and strict isolation.
If nvidia.com/gpu is set to a decimal, for example, 0.2, multiple pods can share a physical GPU, including its compute and memory resources. This is suitable for scenarios with lower compute demands, such as lightweight inference tasks.

This section describes how to use Kubernetes' default GPU scheduling. For more information, see Schedule GPUs.

Precautions

When using Kubernetes' default GPU scheduling, you must use the standard Kubernetes extended resource request method. Adhere to the following precautions:

Do not deploy GPU-consuming applications directly on nodes.
Do not use standalone container tools (such as Docker, Podman, and nerdctl) to launch GPU containers on GPU nodes. For example, avoid the following operations: Run docker run --gpus all or docker run -e NVIDIA_VISIBLE_DEVICES=all and then run GPU programs.
Do not hardcode NVIDIA_VISIBLE_DEVICES or related GPU environment variables in pod YAML env fields. Similarly, do not set NVIDIA_VISIBLE_DEVICES to all or specific device values by default during image building.
Do not set privileged: true in pods' securityContext because this setting allows containers to access all GPUs on the node, potentially starving other workloads on the same node.

Non-standard GPU operations introduce the following risks:

The scheduler cannot accurately track node GPU utilization. Tasks may be scheduled to nodes with exhausted GPU resources, causing resource contention or launch failures.
Bypassing Kubernetes' GPU management may trigger NVIDIA driver conflicts or known community issues, destabilizing workloads. For example, Failed to initialization NVML: Unknown Error.

Prerequisites

A GPU node has been created. For details, see Creating a Node.
The CCE AI Suite (NVIDIA GPU) add-on has been installed, with the selected driver matching the GPU model on the node. For details, see CCE AI Suite (NVIDIA GPU).
When default GPU scheduling is used in clusters earlier than v1.28, the CCE AI Suite (NVIDIA GPU) add-on mounts the driver directory to /usr/local/nvidia/lib64. To use GPU resources in containers, you need to append /usr/local/nvidia/lib64 to the LD_LIBRARY_PATH environment variable. You can skip this step for clusters of v1.28 or later.
You can add environment variables in any of the following ways:
- (Recommended) Configure the LD_LIBRARY_PATH environment variable in the Dockerfile used for creating an image.
```
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib64:$LD_LIBRARY_PATH
```
- Configure the LD_LIBRARY_PATH environment variable in the image startup command.
```
/bin/bash -c "export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH && ..."
```
- Define the LD_LIBRARY_PATH environment variable when creating a workload. (Ensure that this variable is not configured in the container. Otherwise, it will be overwritten.)
```
...
          env:
            - name: LD_LIBRARY_PATH
              value: /usr/local/nvidia/lib64
...
```

Creating a Workload with Default GPU Scheduling Enabled

You can create a workload with default GPU scheduling enabled using either the console or kubectl.

Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Workloads. In the upper right corner of the displayed page, click Create Workload.
In the Container Settings area, choose Basic Info, locate GPU Quota, select a scheduling mode, and specify the required resources. Default GPU scheduling involves GPU card and Shared.
- GPU card: An entire physical GPU will be exclusively allocated to the target pod.
- Shared: Multiple pods can share a physical GPU, including its compute and memory resources.
Figure 1 GPU card
(Optional) Specify GPU memory for the workload. After the setting, CCE will enable affinity between pods and nodes based on the resource type so that the pods can be scheduled to appropriate nodes.
Configure other parameters by referring to Creating a Workload. After completing the settings, click Create Workload in the lower right corner. When the workload changes to the Running state, it is created.

Use kubectl to access the cluster.

Write a YAML file for creating a workload with Kubernetes' default GPU scheduling enabled:

vim gpu-app.yaml

Example file content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-test
  template:
    metadata:
      labels:
        app: gpu-test
    spec:
      nodeSelector:
        accelerator: nvidia-t4
      containers:
      - image: nginx:perl
        name: container-0
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 1   # (Optional) The value must be the same as that of limits.nvidia.com/gpu.
          limits:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 1   # Specified number of GPUs
      imagePullSecrets:
      - name: default-secret

nodeSelector: (Optional) specifies a node selector. After a GPU node is created, CCE adds a label to it. When using GPUs, you can enable affinity between pods and nodes based on labels so that the pods can be scheduled to appropriate nodes.
Obtain nodes with a specified label:
```
kubectl get node -L accelerator
```
Information similar to the following is displayed, where the information in bold is the label value:
```
NAME           STATUS   ROLES    AGE     VERSION                                    ACCELERATOR
10.100.2.179   Ready    <none>   8m43s   v1.19.10-r0-CCE21.11.1.B006-21.11.1.B006   nvidia-t4
```
resources.limits.nvidia.com/gpu: specifies the number of GPUs.
- If nvidia.com/gpu is set to a positive integer, for example, 1, an entire physical GPU will be exclusively allocated to the target pod.
- If nvidia.com/gpu is set to a decimal, for example, 0.2, multiple pods can share a physical GPU, including its compute and memory resources.
requests.nvidia.com/gpu is optional. If it is specified, ensure its value is the same as that of limits.nvidia.com/gpu.

Create the workload.
```
kubectl apply -f gpu-app.yaml
```
If information similar to the following is displayed, the workload has been created:
```
deployment.apps/gpu-test created
```

View the created pod.

kubectl get pod -n default

Information similar to the following is displayed:

NAME                      READY   STATUS    RESTARTS   AGE
gpu-test-6bdb4d7cb-pmtc2   1/1     Running   0          21s

Access the container.

kubectl -n default exec -it gpu-test-6bdb4d7cb-pmtc2 -c container-0 -- /bin/bash

Check whether the GPU has been allocated to the container.
```
nvidia-smi
```
The command output indicates that the GPU has been allocated.

Verifying GPU Isolation

You can perform the tests below to ensure that the access to the accelerator in each container is correctly isolated and controlled by the Kubernetes resource management framework (device plugin or DRA) and container runtime, preventing unauthorized access or mutual interference between workloads.

Test 1: Ensure that the access to the accelerator in a container is managed by the Kubernetes resource management framework (device plugin or DRA) and container runtime.

kubectl get pods -n kube-system

Expected output:

NAME                                  READY   STATUS    RESTARTS   AGE
nvidia-gpu-device-plugin-6zf4b        1/1     Running   0          32m

Check whether the device plugin reports the correct number of GPUs to kubelet.

Run the nvidia-smi command to check the number of GPUs.

Tue Dec 23 10:11:04 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15              Driver Version: 570.86.15      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L2                      Off |   00000000:00:0D.0 Off |                    0 |
| N/A   49C    P8             16W /   72W |       1MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L2                      Off |   00000000:00:0E.0 Off |                    0 |
| N/A   50C    P8             16W /   72W |       1MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L2                      Off |   00000000:00:0F.0 Off |                    0 |
| N/A   42C    P8             12W /   72W |       1MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L2                      Off |   00000000:00:10.0 Off |                    0 |
| N/A   41C    P8             12W /   72W |       1MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |

Check the node information to confirm the number of available GPUs registered by the device plugin.

kubectl describe node

Command output:

Capacity:
  cpu:                96
  ephemeral-storage:  1055758772Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  localssd:           0
  localvolume:        0
  memory:             792305148Ki
  nvidia.com/gpu:     4
  pods:               110
Allocatable:
  cpu:                95690m
  ephemeral-storage:  972987282665
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  localssd:           0
  localvolume:        0
  memory:             771057148Ki
  nvidia.com/gpu:     4
  pods:               110

Create a workload that requests GPUs.

Example YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pytorch-cuda-check
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pytorch-cuda-check
  template:
    metadata:
      labels:
        app: pytorch-cuda-check
    spec:
      containers:
        - name: pytorch-cuda-check
          image: nvcr.io/nvidia/pytorch:25.09-py3
          command: ["/bin/sh", "-c"]
          args:
            - |
              while true; do
                python3 -c "import torch; print(torch.cuda.device_count())"
                sleep 30
              done
          resources:
            limits:
              nvidia.com/gpu: 1

Check the status of the created workload.

kubectl get pods -w

Command output:

NAME                                  READY   STATUS    RESTARTS   AGE
pytorch-cuda-check-68bc4bf767-pdgw7   1/1     Running   0          15m

Update the Deployment and remove the resource request from pod specifications. The command in the container should fail to be executed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pytorch-cuda-check
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pytorch-cuda-check
  template:
    metadata:
      labels:
        app: pytorch-cuda-check
    spec:
      containers:
        - name: pytorch-cuda-check
          image: nvcr.io/nvidia/pytorch:25.09-py3
          command: ["/bin/sh", "-c"]
          args:
            - |
              while true; do
                python3 -c "import torch; print(torch.cuda.device_count())"
                sleep 30
              done
          # resources:
          #   limits:
          #     nvidia.com/gpu: 1

View the log. The failure information should be displayed.

kubectl logs pytorch-cuda-check-68bc4bf767-2zm7s

Command output:

/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
0

Test 2: Ensure that the access to the accelerator in each container is correctly isolated.

Create two pods and allocate an accelerator to each pod. Run the nvidia-smi command to ensure that each pod can access only the accelerator allocated to it.

Example YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pytorch-cuda-check-2
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pytorch-cuda-check
  template:
    metadata:
      labels:
        app: pytorch-cuda-check
    spec:
      containers:
        - name: pytorch-cuda-check
          image: nvcr.io/nvidia/pytorch:25.09-py3
          command: ["/bin/sh", "-c"]
          args:
            - |
              while true; do
                python3 -c "import torch; print(torch.cuda.device_count())"
                sleep 30
              done
          resources:
            limits:
              nvidia.com/gpu: 1

Check the pod status.

kubectl get pods -w

Command output:

NAME                                    READY   STATUS    RESTARTS   AGE
pytorch-cuda-check-2-68bc4bf767-h8b7q   1/1     Running   0          3m55s
pytorch-cuda-check-2-68bc4bf767-jb2kq   1/1     Running   0          3m55s

Verify that each pod is allocated with a different GPU.

kubectl exec -it pytorch-cuda-check-2-68bc4bf767-h8b7q -- nvidia-smi -L
kubectl exec -it pytorch-cuda-check-2-68bc4bf767-jb2kq -- nvidia-smi -L

Command output:

GPU 0: NVIDIA L2 (UUID: GPU-1dda2a1d-a678-15e2-f2cc-9b0622d3d523)
GPU 0: NVIDIA L2 (UUID: GPU-f71e5af2-ca6e-ccc7-e612-c9d23092c9b4)

Common Issues

Symptom: Workload creation failed with the following error information:

0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
0/4 nodes are available: 1 InsufficientResourceOnSingleGPU, 3 Insufficient nvidia.com/gpu.

Cause: The GPU resources are insufficient. When nvidia.com/gpu is specified, CCE only schedules workload pods to nodes with GPUs. If there are not enough GPU resources available, the error is reported.

Solution: Purchase GPU nodes to ensure that there are sufficient GPU resources in your cluster.

Parent Topic: GPU Scheduling

Previous topic: Upgrading the Driver Version of a GPU Node Using a Node Pool

Next topic: GPU Virtualization

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot