Creating a GPU-accelerated Application
This section describes how to use GPU virtualization to isolate the compute and GPU memory and efficiently use GPU resources.
Prerequisites
- You have prepared GPU virtualization resources.
- If you want to create a cluster by running commands, use kubectl to connect to the cluster. For details, see Connecting to a Cluster Using kubectl.
Constraints
- The init container does not support GPU virtualization.
- For a single GPU:
- Up to 20 virtual GPUs can be created.
- Up to 20 pods that use the isolation capability can be scheduled.
- Only workloads in the same isolation mode can be scheduled. (GPU virtualization supports two isolation modes: GPU memory isolation and isolation of GPU memory and compute.)
- For different containers of the same workload:
- You can configure one GPU model and cannot configure two or more GPU models concurrently.
- You can configure the same GPU usage mode and cannot configure virtualization and non-virtualization modes concurrently.
- After a GPU is virtualized, the GPU cannot be used by workloads that use shared GPU resources.
Creating a GPU-accelerated Application on the Console
- Log in to the UCS console.
- Click the on-premises cluster name to access the cluster console. In the navigation pane, choose Workloads. In the upper right corner, click Create from Image.
- Configure the workload parameters. In Basic Info under Container Settings, select GPU for Heterogeneous Resource and select a resource use method.
- Whole GPU: The default Kubernetes scheduling mode schedules the pods to nodes that meet GPU resource requirements.
- Sharing mode: Multiple pods preempt the same GPU. This improves the utilization of idle GPU resources when the workload resource usage fluctuates sharply.
- Virtual GPU: In-house GPU virtualization technology dynamically allocates the GPU memory and compute to improve GPU utilization.
Resource Use Method
- Whole GPU: A GPU is dedicated for one pod. The value ranges from 1 to 10, depending on the number of GPUs on the node.
- Sharing mode: A GPU is shared by multiple pods. Configure the percentage of GPU usage for each individual pod. It is not possible to allocate resources across multiple GPUs. For example, value 50% indicates that all the requested GPU resources come from the same GPU.
Virtual GPU
- GPU memory: GPU virtualization configuration. The value must be an integer multiple of 128 MiB. The minimum value allowed is 128 MiB. If the total GPU memory configured exceeds that of a single GPU, GPU scheduling will not be performed.
- GPU compute (%): GPU virtualization configuration. The value must be a multiple of 5 and cannot exceed 100. This parameter is optional. If it is left blank, the GPU memory is isolated and the compute is shared.
- Configure other parameters and click Create.
Creating a GPU-accelerated Application Using kubectl
- Use kubectl to access a cluster.
- Creating a GPU-accelerated application.
Create a gpu-app.yaml file.
- Static GPU allocation
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-app namespace: default labels: app: gpu-app spec: replicas: 1 selector: matchLabels: app: gpu-app template: metadata: labels: app: gpu-app spec: containers: - name: container-1 image: <your_image_address> # Replace it with your image address. resources: limits: nvidia.com/gpu: 200m # Request for 0.2 GPUs. Value 1 indicates that the GPU resources will be dedicated, and a value less than 1 indicates that the GPU resources will be shared. schedulerName: volcano # To use GPU virtualization, you must use the Volcano scheduler. imagePullSecrets: - name: default-secret
There are two isolation modes: GPU memory isolation and isolation of both GPU memory and compute. volcano.sh/gpu-core.percentage cannot be set separately for GPU compute isolation.
- Isolate the GPU memory only:
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-app namespace: default labels: app: gpu-app spec: replicas: 1 selector: matchLabels: app: gpu-app template: metadata: labels: app: gpu-app spec: containers: - name: container-1 image: <your_image_address> # Replace it with your image address. resources: limits: volcano.sh/gpu-mem.128Mi: 5 # GPU memory allocated to the pod, in the unit of 128 MiB schedulerName: volcano # To use GPU virtualization, you must use the Volcano scheduler. imagePullSecrets: - name: default-secret
- Isolate both the GPU memory and compute:
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-app namespace: default labels: app: gpu-app spec: replicas: 1 selector: matchLabels: app: gpu-app template: metadata: labels: app: gpu-app spec: containers: - name: container-1 image: <your_image_address> # Replace it with your image address. resources: limits: volcano.sh/gpu-mem.128Mi: 5 # GPU memory allocated to the pod, in the unit of 128 MiB volcano.sh/gpu-core.percentage: 25 # Compute allocated to the pod schedulerName: volcano # To use GPU virtualization, you must use the Volcano scheduler. imagePullSecrets: - name: default-secret
Table 1 Key parameters Parameter
Mandatory
Description
nvidia.com/gpu
No
nvidia.com/gpu specifies the number of GPUs to be requested. The value can be smaller than 1. For example, nvidia.com/gpu: 0.5 indicates that multiple pods share a GPU. In this case, all the requested GPU resources come from the same GPU.
After nvidia.com/gpu is specified, workloads will not be scheduled to nodes without GPUs. If the node is GPU-starved, Kubernetes events similar to the following will be reported:
- 0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
- 0/4 nodes are available: 1 InsufficientResourceOnSingleGPU, 3 Insufficient nvidia.com/gpu.
volcano.sh/gpu-mem.128Mi
No
The GPU memory, which must be a positive integer multiple of 128 MiB. For example, if the value is set to 5, the GPU memory will be 640 MiB (128 MiB × 5). If the total GPU memory configured exceeds that of a single GPU, GPU scheduling will not be performed.
volcano.sh/gpu-core.percentage
No
The value must be a multiple of 5 and cannot exceed 100.
Only compute isolation is not supported. volcano.sh/gpu-core.percentage cannot be configured separately.
- Static GPU allocation
- Run the following command to create an application:
kubectl apply -f gpu-app.yaml
Verifying GPU Virtualization Isolation
- Log in to the target container and check its GPU memory.
kubectl exec -it gpu-app -- nvidia-smi
Expected output:Wed Apr 12 07:54:59 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 | | N/A 27C P0 37W / 300W | 4792MiB / 5000MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
5,000 MiB of GPU memory is allocated to the container, and 4,792 MiB is used.
- Run the following command on the node to check the isolation of the GPU memory:
export PATH=$PATH:/usr/local/nvidia/bin;nvidia-smi
Expected output:
Wed Apr 12 09:31:10 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 | | N/A 27C P0 37W / 300W | 4837MiB / 16160MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 760445 C python 4835MiB | +-----------------------------------------------------------------------------+
The expected output indicates that the total GPU memory on the node is 16160 MiB, and 4837 MiB is used by the example pod.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot