Help Center/
ModelArts/
Troubleshooting/
Training Jobs/
GP Issues/
Error Message "cuda runtime error (10) : invalid device ordinal at xxx" Is Displayed in Logs
Updated on 2025-08-22 GMT+08:00
Error Message "cuda runtime error (10) : invalid device ordinal at xxx" Is Displayed in Logs
Symptom
A training job fails, and the following error is printed in logs:
RuntimeError: cuda runtime error (10) : invalid device ordinal at xxx
Figure 1 Error log

Possible Causes
The issue may arise due to the following reasons:
- The CUDA_VISIBLE_DEVICES setting does not align with the job specifications. For instance, if you select a job with four GPs (IDs 0, 1, 2, and 3), but perform CUDA operations specifying tensor.to(device="cuda:7"), it targets GP 7, which exceeds the available GP IDs.
- Damaged GPs on resource nodes may result in fewer detected GPs than the selected specifications.
Solution
- Perform CUDA operations on GPUs with IDs specified by CUDA_VISIBLE_DEVICES.
- If a GP on a resource node is damaged, contact technical support.
Summary and Suggestions
Before creating a training job, use the ModelArts development environment to debug your training code and minimize migration errors.
- Use the notebook environment for online debugging. For details, see Using JupyterLab to Develop Models.
- Use a local IDE (PyCharm or VS Code) to access the cloud environment for debugging. For details, see Using a Local IDE to Develop Models.
Parent topic: GP Issues
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot