Supporting Kubernetes' Default GPU Scheduling

After GPU virtualization is enabled, the target GPU node does not support the workloads that use Kubernetes' default GPU scheduling by default, which are workloads using nvidia.com/gpu resources. If there are workloads using nvidia.com/gpu resources in your cluster, you can enable the GPU node to support GPU sharing in the gpu-device-plugin configuration so that the GPU node can support Kubernetes' default GPU scheduling.

If you enable compatibility, the nvidia.com/gpu quota specified in workloads (the nvidia.com/gpu quota is set to a decimal fraction, for example, 0.5) is provided by GPU virtualization to implement GPU memory isolation. The GPU memory is allocated to containers based on the specified quota. For example, 8 GiB (0.5 x 16 GiB) GPU memory is allocated. The value of GPU memory must be an integer multiple of 128 MiB. Otherwise, the value is automatically rounded down to the nearest integer. If nvidia.com/gpu resources have been used in the workload before compatibility is enabled, the resources will not be provided by GPU virtualization but the entire GPU.
After compatibility is enabled, if you use the nvidia.com/gpu quota, it is equivalent to enabling GPU memory isolation. The nvidia.com/gpu quota can share a GPU with workloads in GPU memory isolation mode, but cannot share a GPU with workloads in compute and GPU memory isolation mode.
If compatibility is disabled, the nvidia.com/gpu quota specified in the workload only affects the scheduling result. It does not require GPU memory isolation. That is, although the nvidia.com/gpu quota is set to 0.5, you can still view complete GPU memory in the container. In addition, workloads using nvidia.com/gpu resources and workloads using virtualized GPU memory cannot be scheduled to the same node.
If you deselect Virtualization nodes are compatible with GPU sharing mode, running workloads will not be affected, but workloads may fail to be scheduled. For example, if compatibility is disabled, the workload using nvidia.com/gpu resources are still in the GPU memory isolation mode. As a result, the GPU cannot schedule workloads in compute and GPU memory isolation mode. You need to delete workloads using nvidia.com/gpu resources before rescheduling.

Constraints

To support Kubernetes' default GPU scheduling on GPU nodes, the CCE AI Suite (NVIDIA GPU) add-on must be of v2.0.10 or later, and the Volcano Scheduler add-on must be of v1.10.5 or later.

Procedure

Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Add-ons.
Locate CCE AI Suite (NVIDIA GPU) on the right and click Install.

If the add-on has been installed, click Edit.
Configure the add-on. For details, see Installing the add-on.

After GPU virtualization is enabled, you can configure the nvidia.com/gpu field to enable or disable the function of supporting Kubernetes' default GPU scheduling.
Click Install.

Configuration Example

Use kubectl to access the cluster.

Create a workload that uses nvidia.com/gpu resources.

Create a gpu-app.yaml file. The following shows an example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-app
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-app
  template:
    metadata:
      labels:
        app: gpu-app
    spec:
      containers:
        image: <your_image_address>     # Replace it with your image address.
        name: container-0
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 0.1   # Number of requested GPUs
          limits:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 0.1   # Maximum number of GPUs that can be used
      imagePullSecrets:
      - name: default-secret

Run the following command to create an application:
```
kubectl apply -f gpu-app.yaml
```

kubectl exec -it gpu-app -- nvidia-smi

Expected output:

Thu Jul 27 07:53:49 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A30          Off  | 00000000:00:0D.0 Off |                    0 |
| N/A   47C    P0    34W / 165W |      0MiB /  2304MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The output shows that the total GPU memory that can be used by the pod is 2304 MiB.

In this example, the total GPU memory on the GPU node is 24258 MiB, but the number 2425.8 (24258 x 0.1) is not an integer multiple of 128 MiB. Therefore, the value 2425.8 is rounded down to 18 times of 128 MiB (18 x 128 MiB = 2304 MiB).

Parent topic: GPU Virtualization

Previous topic: Using GPU Virtualization

Next topic: Monitoring GPU Metrics