更新时间:2025-11-18 GMT+08:00
用户使用Torch报错Unexpected error from cudaGetDeviceCount
问题现象
在Notebook执行兼容GPU的脚本时报错不兼容,但是通过nvcc --version排查显示是兼容。
import torch
import sys
print('A', sys.version)
print('B', torch.__version__)
print('C', torch.cuda.is_available())
print('D', torch.backends.cudnn.enabled)
device = torch.device('cuda')
print('E', torch.cuda.get_device_properties(device))
print('F', torch.tensor([1.0, 2.0]).cuda())
报错如下:
Traceback (most recent call last):
File "test.py", line 8, in <module>
print('E', torch.cuda.get_device_properties(device))
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 356, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination</module>
解决方式
- 先排查CUDA和Torch版本是否兼容。
# CUDA版本 nvcc --version # nvidia-smi版本 nvidia-smi # Torch版本(要确定用户用的哪个conda下的python) python -c "import torch;print(torch.__version__)"
通过PyTorch官网可查兼容版本。
- 如果环境中装了多版本的CUDA,可以排查LD_LIBRARY_PATH中的cuda优先级,需要手动调整下。
例如,如果CUDA只兼容CUDA-9.1,查询到LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:/usr/local/cuda-9.1/lib64
需要手动调整优先级,执行命令export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64:$LD_LIBRARY_PATH
父主题: 自定义镜像故障