Help Center/ ModelArts/ Troubleshooting/ DevEnviron/ Save an Image Failures/ Error Message "Unexpected error from cudaGetDeviceCount" Is Displayed When Torch Is Used
Updated on 2024-12-30 GMT+08:00

Error Message "Unexpected error from cudaGetDeviceCount" Is Displayed When Torch Is Used

Symptom

When a GPU-compatible script is executed on a notebook instance, an error message is displayed, indicating that the script is incompatible. However, the nvcc --version command output shows that the script is compatible.

import torch
import sys
print('A', sys.version)
print('B', torch.__version__)
print('C', torch.cuda.is_available())
print('D', torch.backends.cudnn.enabled)
device = torch.device('cuda')
print('E', torch.cuda.get_device_properties(device))
print('F', torch.tensor([1.0, 2.0]).cuda())

The error information is as follows:

Traceback (most recent call last):
File "test.py", line 8, in <module>
print('E', torch.cuda.get_device_properties(device))
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 356, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination</module>

Solution

  1. Check whether the CUDA version is compatible with the Torch version.
    # CUDA version
    nvcc --version
    # nvidia-smi version
    nvidia-smi
    
    # Torch version (Determine the Python version of the Conda used.)
    python -c "import torch;print(torch.__version__)"

    You can query compatible versions at the PyTorch official website https://pytorch.org/get-started/previous-versions/.

  2. If multiple CUDA versions are installed in the environment, check the CUDA priority in LD_LIBRARY_PATH and manually adjust the priority.

    For example, if CUDA is compatible only with CUDA 9.1, LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:/usr/local/cuda-9.1/lib64 is queried.

    Run the export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64:$LD_LIBRARY_PATH command to manually adjust the priority.