Creating a GPU-accelerated Application

This section describes how to use GPU virtualization to isolate the compute and GPU memory and efficiently use GPU resources.

Prerequisites

You have prepared GPU virtualization resources.
If you want to create a cluster by running commands, use kubectl to connect to the cluster. For details, see Connecting to a Cluster Using kubectl.

Constraints

The init container does not support GPU virtualization.

For a single GPU:
- Up to 20 virtual GPUs can be created.
- Up to 20 pods that use the isolation capability can be scheduled.
- Only workloads in the same isolation mode can be scheduled. (GPU virtualization supports two isolation modes: GPU memory isolation and isolation of GPU memory and compute.)
For different containers of the same workload:
- You can configure one GPU model and cannot configure two or more GPU models concurrently.
- You can configure the same GPU usage mode and cannot configure virtualization and non-virtualization modes concurrently.
After a GPU is virtualized, the GPU cannot be used by workloads that use shared GPU resources.

Creating a GPU-accelerated Application on the Console

Log in to the UCS console.
Click the on-premises cluster name to access the cluster console. In the navigation pane, choose Workloads. In the upper right corner, click Create from Image.
Configure the workload parameters. In Basic Info under Container Settings, select GPU for Heterogeneous Resource and select a resource use method.
- Whole GPU: The default Kubernetes scheduling mode schedules the pods to nodes that meet GPU resource requirements.
- Sharing mode: Multiple pods preempt the same GPU. This improves the utilization of idle GPU resources when the workload resource usage fluctuates sharply.
- Virtual GPU: In-house GPU virtualization technology dynamically allocates the GPU memory and compute to improve GPU utilization.
Resource Use Method
- Whole GPU: A GPU is dedicated for one pod. The value ranges from 1 to 10, depending on the number of GPUs on the node.
- Sharing mode: A GPU is shared by multiple pods. Configure the percentage of GPU usage for each individual pod. It is not possible to allocate resources across multiple GPUs. For example, value 50% indicates that all the requested GPU resources come from the same GPU.
Virtual GPU
- GPU memory: GPU virtualization configuration. The value must be an integer multiple of 128 MiB. The minimum value allowed is 128 MiB. If the total GPU memory configured exceeds that of a single GPU, GPU scheduling will not be performed.
- GPU compute (%): GPU virtualization configuration. The value must be a multiple of 5 and cannot exceed 100. This parameter is optional. If it is left blank, the GPU memory is isolated and the compute is shared.
Configure other parameters and click Create.

Creating a GPU-accelerated Application Using kubectl

Use kubectl to access the cluster.

Creating a GPU-accelerated application.

Create a gpu-app.yaml file.

Static GPU allocation

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-app
  namespace: default
  labels:
    app: gpu-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-app
  template: 
    metadata:
      labels:
        app: gpu-app
    spec:
      containers:
      - name: container-1
        image: <your_image_address>     # Replace it with your image address.
        resources:
          limits:
            nvidia.com/gpu: 200m        # Request for 0.2 GPUs. Value 1 indicates that the GPU resources will be dedicated, and a value less than 1 indicates that the GPU resources will be shared.
     schedulerName: volcano            # To use GPU virtualization, you must use the Volcano scheduler.
      imagePullSecrets:
        - name: default-secret

There are two isolation modes: GPU memory isolation and isolation of both GPU memory and compute. volcano.sh/gpu-core.percentage cannot be set separately for GPU compute isolation.

Isolate the GPU memory only:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-app
  namespace: default
  labels:
    app: gpu-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-app
  template: 
    metadata:
      labels:
        app: gpu-app
    spec:
      containers:
      - name: container-1
        image: <your_image_address>      # Replace it with your image address.
        resources:
          limits:
            volcano.sh/gpu-mem.128Mi: 5  # GPU memory allocated to the pod, in the unit of 128 MiB
      schedulerName: volcano         # To use GPU virtualization, you must use the Volcano scheduler.
      imagePullSecrets:
        - name: default-secret

Isolate both the GPU memory and compute:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-app
  namespace: default
  labels:
    app: gpu-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-app
  template: 
    metadata:
      labels:
        app: gpu-app
    spec:
      containers:
      - name: container-1
        image: <your_image_address>         # Replace it with your image address.
        resources:
          limits:
            volcano.sh/gpu-mem.128Mi: 5     # GPU memory allocated to the pod, in the unit of 128 MiB
            volcano.sh/gpu-core.percentage: 25    # Compute allocated to the pod
      schedulerName: volcano                 # To use GPU virtualization, you must use the Volcano scheduler.
      imagePullSecrets:
        - name: default-secret

**Table 1** Key parameters
Parameter	Mandatory	Description
nvidia.com/gpu	No	nvidia.com/gpu specifies the number of GPUs to be requested. The value can be smaller than 1. For example, nvidia.com/gpu: 0.5 indicates that multiple pods share a GPU. In this case, all the requested GPU resources come from the same GPU. After nvidia.com/gpu is specified, workloads will not be scheduled to nodes without GPUs. If the node is GPU-starved, Kubernetes events similar to the following will be reported: 0/2 nodes are available: 2 Insufficient nvidia.com/gpu. 0/4 nodes are available: 1 InsufficientResourceOnSingleGPU, 3 Insufficient nvidia.com/gpu.
volcano.sh/gpu-mem.128Mi	No	The GPU memory, which must be a positive integer multiple of 128 MiB. For example, if the value is set to 5, the GPU memory will be 640 MiB (128 MiB × 5). If the total GPU memory configured exceeds that of a single GPU, GPU scheduling will not be performed.
volcano.sh/gpu-core.percentage	No	The value must be a multiple of 5 and cannot exceed 100. Only compute isolation is not supported. volcano.sh/gpu-core.percentage cannot be configured separately.

Run the following command to create an application:
```
kubectl apply -f gpu-app.yaml
```

Verifying GPU Virtualization Isolation

After an application is created, you can verify its GPU virtualization isolation.

kubectl exec -it gpu-app -- nvidia-smi

Expected output:

Wed Apr 12 07:54:59 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:21:01.0 Off |                    0 |
| N/A   27C    P0    37W / 300W |   4792MiB /  5000MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

5,000 MiB of GPU memory is allocated to the container, and 4,792 MiB is used.

Run the following command on the node to check the isolation of the GPU memory:

export PATH=$PATH:/usr/local/nvidia/bin;nvidia-smi

Expected output:

Wed Apr 12 09:31:10 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:21:01.0 Off |                    0 |
| N/A   27C    P0    37W / 300W |   4837MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    760445      C   python                           4835MiB |
+-----------------------------------------------------------------------------+

The expected output indicates that the total GPU memory on the node is 16160 MiB, and 4837 MiB is used by the example pod.

Parent topic: GPU Scheduling

Previous topic: Preparing GPU Resources

Next topic: Monitoring GPU Resources