Creating a Workload That Will Receive vGPU Support
This section describes how to use GPU virtualization to isolate the compute and GPU memory and efficiently use GPU resources.
Prerequisites
- You have prepared GPU virtualization resources.
- If you want to create a cluster by running commands, use kubectl to connect to the cluster. For details, see Connecting to a Cluster Using kubectl.
Constraints
- The init container does not support GPU virtualization.
- For a single GPU:
- A maximum of 20 vGPUs can be created.
- A maximum of 20 pods that use the isolation capability can be scheduled.
- Only workloads in the same isolation mode can be scheduled. (GPU virtualization supports two isolation modes: GPU memory isolation and isolation of GPU memory and compute.)
- For different containers of the same workload:
- You can configure one GPU model and cannot configure two or more GPU models concurrently.
- You can configure the same GPU usage mode and cannot configure virtualization and non-virtualization modes concurrently.
- After a GPU is virtualized, the GPU cannot be used by workloads that use shared GPU resources.
Creating a Workload That Will Receive vGPU Support on the Console
- Log in to the UCS console.
- Click the on-premises cluster name to access its details page, choose Workloads in the navigation pane, and click Create Workload in the upper right corner.
- Configure workload parameters. In Container Settings, choose Basic Info and set the GPU quota.
Video memory: The value must be a positive integer, in MiB. If the configured GPU memory exceeds that of a single GPU, GPU scheduling cannot be performed.
Computing power: The value must be a multiple of 5, in %, and cannot exceed 100.
Figure 1 Configuring workload information
- Configure other parameters and click Create Workload.
- Verify the isolation capability of GPU virtualization.
- Log in to the target container and check its GPU memory.
kubectl exec -it gpu-app -- nvidia-smi
Expected output:Wed Apr 12 07:54:59 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 | | N/A 27C P0 37W / 300W | 4792MiB / 5000MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
5,000 MiB of GPU memory is allocated to the container, and 4,792 MiB is used.
- Run the following command on the node to check the isolation of the GPU memory:
export PATH=$PATH:/usr/local/nvidia/bin;nvidia-smi
Expected output:
Wed Apr 12 09:31:10 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 | | N/A 27C P0 37W / 300W | 4837MiB / 16160MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 760445 C python 4835MiB | +-----------------------------------------------------------------------------+
16,160 MiB of GPU memory is allocated to the GPU node, and 4,837 MiB is used by the pod.
- Log in to the target container and check its GPU memory.
Creating a Workload That Will Receive vGPU Support Using kubectl
- Log in to the master node and use kubectl to connect to the cluster.
- Create a workload that will support vGPUs. Create a gpu-app.yaml file.
There are two isolation modes: GPU memory isolation and isolation of both GPU memory and compute. volcano.sh/gpu-core.percentage cannot be set separately for GPU compute isolation.
- Isolate the GPU memory only:
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-app labels: app: gpu-app spec: replicas: 1 selector: matchLabels: app: gpu-app template: metadata: labels: app: gpu-app spec: containers: - name: container-1 image: <your_image_address> # Replace it with your image address. resources: limits: volcano.sh/gpu-mem: 5000 # GPU memory allocated to the pod imagePullSecrets: - name: default-secret
- Isolate both the GPU memory and compute:
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-app labels: app: gpu-app spec: replicas: 1 selector: matchLabels: app: gpu-app template: metadata: labels: app: gpu-app spec: containers: - name: container-1 image: <your_image_address> # Replace it with your image address. resources: limits: volcano.sh/gpu-mem: 5000 # GPU memory allocated to the pod volcano.sh/gpu-core.percentage: 25 # Compute allocated to the pod imagePullSecrets: - name: default-secret
Table 1 Key parameters Parameter
Required
Description
volcano.sh/gpu-mem
No
The value must be a positive integer, in MiB. If the configured GPU memory exceeds that of a single GPU, GPU scheduling cannot be performed.
volcano.sh/gpu-core.percentage
No
The value must be a multiple of 5, in %, and cannot exceed 100.
- Isolate the GPU memory only:
- Run the following command to create a workload:
kubectl apply -f gpu-app.yaml
- Verify the isolation.
- Log in to a container and check its GPU memory.
kubectl exec -it gpu-app -- nvidia-smi
Expected output:
Wed Apr 12 07:54:59 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 | | N/A 27C P0 37W / 300W | 4792MiB / 5000MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
5,000 MiB of GPU memory is allocated to the container, and 4,792 MiB is used.
- Run the following command on the node to check GPU memory isolation:
/usr/local/nvidia/bin/nvidia-smi
Expected output:
Wed Apr 12 09:31:10 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 | | N/A 27C P0 37W / 300W | 4837MiB / 16160MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 760445 C python 4835MiB | +-----------------------------------------------------------------------------+
16,160 MiB of GPU memory is allocated to the node, and 4,837 MiB is used by the pod in this example.
- Log in to a container and check its GPU memory.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot