Help Center/ Cloud Container Instance/ FAQs/ Container Workload FAQs/ Why an Error Is Reported When a GPU-Related Operation Is Performed on the Container Entered by Using exec?
Updated on 2024-11-05 GMT+08:00

Why an Error Is Reported When a GPU-Related Operation Is Performed on the Container Entered by Using exec?

Symptom

After I enter a container using exec and perform a GPU-related operation (such as using nvidia-smi or running a GPU training task using TensorFlow), the error message "cannot open shared object file: No such file or directory" is displayed.

Possible Cause

The CUDA library in a container is located in /usr/local/nvidia/lib64. This directory must be added to LD_LIBRARY_PATH to ensure that the CUDA library can be found.

Solution

Log in to the GPU-accelerated container by using kubectl exec or console, run the export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib64 command, and then perform other GPU-related operations.