Help Center/ ModelArts/ Troubleshooting/ Training Jobs/ GPU Issues/ Error Message "No CUDA-capable device is detected" Displayed in Logs

Updated on 2022-12-08 GMT+08:00

View PDF

Error Message "No CUDA-capable device is detected" Displayed in Logs

Symptom

An error similar to the following occurs during the running of the program:

1. 'failed call to cuInit: CUDA_ERROR_NO_DEVICE:  no CUDA-capable device is detected'
2. 'No CUDA-capable device is detected although requirements are installed'

Possible Causes

The possible causes are as follows:

CUDA_VISIBLE_DEVICES has been incorrectly set.
CUDA operations are performed on GPUs with IDs that are not specified by CUDA_VISIBLE_DEVICES.

Solution

Do not change the CUDA_VISIBLE_DEVICES value in the code. Use its default value.
Ensure that the specified GPU IDs are within the available GPU IDs.
If the error persists, print the CUDA_VISIBLE_DEVICES value and debug it in the notebook, or run the following commands to check whether the returned result is True:
```
import torch
torch.cuda.is_available()
```

Summary and Suggestions

Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.

Use the online notebook environment for debugging. For details, see Using JupyterLab to Develop a Model.
Use the local IDE (PyCharm or VS Code) to access the cloud environment for debugging. For details, see Using the Local IDE to Develop a Model.

Parent topic: GPU Issues

Previous topic: GPU Issues

Next topic: Error Message "RuntimeError: connect() timed out" Displayed in Logs