Preparing GPU Resources
This section describes how you can plan and prepare basic software and hardware before using GPU capabilities.
Basic Planning
Resource |
Version |
---|---|
Cluster |
v1.25.15-r7 or later |
OS |
Huawei Cloud EulerOS 2.0 |
System architecture |
x86 |
GPU |
T4 and V100 |
Driver version |
Only GPU driver 470.57.02, 510.47.03, or 535.54.03 for GPU virtualization |
Container runtime |
containerd |
Add-ons |
The following add-ons must be installed in a cluster:
|
Step 1: Add GPU Nodes to a Cluster and Label the Nodes
If there are GPU nodes that comply with the basic planning in your cluster, skip this procedure.
- Add GPU nodes to your cluster. For details, see Adding Nodes to On-Premises Clusters.
- Label the nodes with accelerator: nvidia-{GPU model}. For details, see Adding Labels/Taints to Nodes.
Figure 1 Labeling nodes that support GPU virtualization
Step 2: Install the Add-ons
If the add-ons that comply with the basic planning have been installed in your cluster, you can skip this procedure.
Before restarting a node, evict all pods on that node. Make sure to reserve GPU resources to avoid pod scheduling failures during node drainage. Insufficient resources can affect services.
- Log in to the UCS console and click the cluster name to access the cluster console. In the navigation pane, choose Add-ons. In the Add-ons Installed area, check whether the Volcano and gpu-device-plugin add-ons have been installed.
- If the gpu-device-plugin add-on is not installed, install it by referring to gpu-device-plugin.
To enable GPU virtualization, install the Volcano add-on. For details, see Volcano.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot