Supporting Kubernetes' Default GPU Scheduling

After GPU virtualization is enabled, you are advised to configure volcano.sh/gpu-mem.128Mi for GPU memory isolation and both volcano.sh/gpu-mem.128Mi and volcano.sh/gpu-core.percentage for compute-GPU memory isolation when scheduling GPUs for workloads. Kubernetes' default GPU scheduling is still available, allowing your workloads to use nvidia.com/gpu resources.

Setting nvidia.com/gpu to a decimal fraction (for example, 0.5) allows GPU virtualization to allocate the specified nvidia.com/gpu resources for GPU memory isolation in workloads. Containers will receive GPU memory based on the specified value, such as 8 GiB (0.5 x 16 GiB). The GPU memory value must be a multiple of 128 MiB, or it will be rounded down to the nearest integer automatically. If nvidia.com/gpu is set to an integer, the entire available GPUs will be used. If nvidia.com/gpu resources have been used in the workload before GPU virtualization is enabled, the resources will be from the entire GPUs but not GPU virtualization.
When GPU virtualization is enabled, configuring nvidia.com/gpu for a workload enables GPU memory isolation. This allows the workload to share the same GPU with other workloads using volcano.sh/gpu-mem.128Mi resources. However, it cannot share the GPU with workloads that use both volcano.sh/gpu-mem.128Mi and volcano.sh/gpu-core.percentage resources. In addition, Notes and Constraints on GPU virtualization must be followed.

Notes and Constraints

To support Kubernetes' default GPU scheduling on GPU nodes, the CCE AI Suite (NVIDIA GPU) add-on must be of v2.0.10 or later, and the Volcano Scheduler add-on must be of v1.10.5 or later.

Configuration Example

Use kubectl to access the cluster.

Create a workload that uses nvidia.com/gpu resources.

Create a gpu-app.yaml file. The following shows an example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-app
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-app
  template:
    metadata:
      labels:
        app: gpu-app
    spec:
      schedulerName: volcano
      containers:
        image: <your_image_address>     # Replace it with your image address.
        name: container-0
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 0.1   # Number of requested GPUs
          limits:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 0.1   # Maximum number of GPUs that can be used
      imagePullSecrets:
      - name: default-secret

Run the following command to create an application:
```
kubectl apply -f gpu-app.yaml
```

kubectl exec -it gpu-app -- nvidia-smi

Expected output:

Thu Jul 27 07:53:49 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A30          Off  | 00000000:00:0D.0 Off |                    0 |
| N/A   47C    P0    34W / 165W |      0MiB /  2304MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The output shows that the total GPU memory that can be used by the pod is 2304 MiB.

In this example, the total GPU memory on the GPU node is 24258 MiB, but the number 2425.8 (24258 × 0.1) is not an integer multiple of 128 MiB. Therefore, the value 2425.8 is rounded down to 18 times of 128 MiB (18 × 128 MiB = 2304 MiB).

Parent Topic: GPU Virtualization

Previous topic: Using GPU Virtualization

Next topic: Monitoring GPU Metrics