AI Suite (NV GPU)
Description
The AI suite, NV GPU, is a device management plug-in that supports GPUs in containers. To use GPU nodes in a cluster, this plug-in must be installed.
Constraints
- When you create a dedicated resource pool, this plug-in is automatically installed only when the instance specification type is set to GPU.
- Do not upgrade the GPU driver before the plug-in upgrade is complete. Otherwise, the driver upgrade may be suspended or fail.
Verifying the Add-on
After the add-on is installed, run the nvidia-smi command on the GPU node and the container that schedules GPU resources to verify the availability of the GPU device and driver.
- GPU node:
- If the add-on version is earlier than 2.0.0, run the following command:
cd /opt/cloud/cce/nvidia/bin && ./nvidia-smi
- If the add-on version is 2.0.0 or later, run the following command:
cd /usr/local/nvidia/bin && ./nvidia-smi
- If the add-on version is earlier than 2.0.0, run the following command:
- Container:
- If the cluster version is v1.27 or earlier, run the following command:
cd /usr/local/nvidia/bin && ./nvidia-smi
- If the cluster version is v1.28 or later, run the following command:
cd /usr/bin && ./nvidia-smi
- If the cluster version is v1.27 or earlier, run the following command:
If GPU information is returned, the device is available and the add-on has been installed.
Components
Component |
Description |
Resource Type |
---|---|---|
nvidia-driver-installer |
A workload for installing the NV GPU driver on a node, which only uses resources during the installation process. Once the installation is finished, no resources are used. |
DaemonSet |
hce20-nvidia-driver-installer |
A workload for installing the NV GPU driver on a node, which only uses resources during the installation process. Once the installation is finished, no resources are used (used to adapt to OS HCE 2.0). |
DaemonSet |
ubuntu22-nvidia-driver-installer |
A workload for installing the NV GPU driver on a node, which only uses resources during the installation process. Once the installation is finished, no resources are used (used to adapt to OS Ubuntu22). |
DaemonSet |
nvidia-gpu-device-plugin |
A Kubernetes device plugin that provides NV GPU heterogeneous compute for containers |
DaemonSet |
nvidia-operator |
A component that provides NV GPU node management capabilities for clusters |
Deployment |
dcgm-exporter |
A component that is installed when DCGM-Exporter is enabled to observe DCGM metrics. It is used to collect GPU metrics. |
DaemonSet |
Change History
Plug-in Version |
New Feature |
---|---|
2.7.63 |
Fixed security vulnerabilities. |
2.7.42 |
Added the NV 535.216.03 driver to support xGPUs. |
2.6.4 |
Updated the isolation logic of GPUs. |
2.0.72 |
Updated the isolation logic of GPUs. |
2.0.48 |
Fixed the issue occurred during driver installation. |
2.0.44 |
|
2.0.14 |
|
1.2.29 |
|
1.2.24 |
|
1.2.20 |
Set the plug-in alias to gpu. |
1.2.15 |
Adapted to CCE v1.23 clusters. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot