Why an Error Is Reported When a GPU-Related Operation Is Performed on the Container Entered by Using exec?
Symptom
After I enter a container using exec and perform a GPU-related operation (such as using nvidia-smi or running a GPU training task using TensorFlow), the error message "cannot open shared object file: No such file or directory" is displayed.
Possible Cause
The CUDA library in a container is located in /usr/local/nvidia/lib64. This directory must be added to LD_LIBRARY_PATH to ensure that the CUDA library can be found.
Solution
Log in to the GPU-accelerated container by using kubectl exec or console, run the export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib64 command, and then perform other GPU-related operations.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot