Help Center/
ModelArts/
Troubleshooting/
Training Jobs/
GP Issues/
Error Message "RuntimeError: Cannot re-initialize CUDA in forked subprocess" Is Displayed in Logs
Updated on 2025-08-22 GMT+08:00
Error Message "RuntimeError: Cannot re-initialize CUDA in forked subprocess" Is Displayed in Logs
Symptom
When PyTorch is used to start multiple processes, the following error message is displayed:
RuntimeError: Cannot re-initialize CUDA in forked subprocess
Possible Causes
The possible causes are as follows:
The boot mode of multi-processing is incorrect.
Solution
For details, see Writing Distributed Applications with PyTorch.
"""run.py:""" #!/usr/bin/env python import os import torch import torch.distributed as dist import torch.multiprocessing as mp def run(rank, size): """ Distributed function to be implemented later. """ pass def init_process(rank, size, fn, backend='gloo'): """ Initialize the distributed environment. """ os.environ['MASTER_ADDR'] = '127.0.0.1' os.environ['MASTER_PORT'] = '29500' dist.init_process_group(backend, rank=rank, world_size=size) fn(rank, size) if __name__ == "__main__": size = 2 processes = [] mp.set_start_method("spawn") for rank in range(size): p = mp.Process(target=init_process, args=(rank, size, run)) p.start() processes.append(p) for p in processes: p.join()
Summary and Suggestions
Before creating a training job, use the ModelArts development environment to debug your training code and minimize migration errors.
- Use the notebook environment for online debugging. For details, see Using JupyterLab to Develop Models.
- Use a local IDE (PyCharm or VS Code) to access the cloud environment for debugging. For details, see Using a Local IDE to Develop Models.
Parent topic: GP Issues
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot