Help Center> Cloud Container Instance (CCI)> FAQ> Container Workload FAQs> Why an Error Is Reported When a GPU-Related Operation Is Performed on the Container Entered by Using exec?

Why an Error Is Reported When a GPU-Related Operation Is Performed on the Container Entered by Using exec?

Symptom

After I enter a container using exec and perform a GPU-related operation (such as using nvidia-smi or running a GPU training task using TensorFlow), the error message "cannot open shared object file: No such file or directory" is displayed.

Possible Cause

The CUDA library in a container is located in /usr/local/nvidia/lib64. This directory must be added to LD_LIBRARY_PATH to ensure that the CUDA library can be found.

Solution

Log in to the GPU-accelerated container by using kubectl exec or console, run the export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib64 command, and then perform other GPU-related operations.