Environment Constraints for GPU Monitoring
- Only Linux OSs are supported, and only some Linux public image versions support GPU monitoring. For details, see What OSs Does the Agent Support?
- Supported flavors: G6v, G6, P2s, P2v, P2vs, G5, Pi2, Pi1, ECSs of P1 series, the BMSs of the P, Pi, G, and KP series.
- The lspci tool has been installed on the ECS. If the lspci tool is not installed on the ECS, GPU metric data cannot be collected and events cannot be reported.
To install the lspci tool, perform the following steps:
- Log in to the ECS.
- Update the image source to obtain the installation dependencies.
wget http://mirrors.myhuaweicloud.com/repo/mirrors_source.sh && bash mirrors_source.sh
For more information, see How Can I Use an Automated Tool to Configure a HUAWEI CLOUD Image Source (x86_64 and Arm)?
- Run the following command to install the lspci tool:
- Run the following command to view the installation result:
Figure 1 Example installation result
- GPU metric collection depends on the following driver files. Check whether there are corresponding driver files in the environment.
- Linux driver file
nvmlUbuntuNvidiaLibraryPath = "/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1" nvmlCentosNvidiaLibraryPath = "/usr/lib64/libnvidia-ml.so.1" nvmlCceNvidiaLibraryPath = "/opt/cloud/cce/nvidia/lib64/libnvidia-ml.so.1"
- Windows driver file
DefaultNvmlDLLPath = "C:\\Program Files\\NVIDIA Corporation\\NVSMI\\nvml.dll" WHQLNvmlDLLPath = "C:\\Windows\\System32\\nvml.dll"
- Linux driver file
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot