Why an Error Is Reported When a GPU-Related Operation Is Performed on the Container Entered by Using exec?
Symptom
After I enter a container using exec and perform a GPU-related operation (such as using nvidia-smi or running a GPU training task using TensorFlow), the error message "cannot open shared object file: No such file or directory" is displayed.

Possible Cause
The CUDA library in a container is located in /usr/local/nvidia/lib64. This directory must be added to LD_LIBRARY_PATH to ensure that the CUDA library can be found.
Solution
Log in to the GPU-accelerated container by using kubectl exec or console, run the export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib64 command, and then perform other GPU-related operations.
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.