Help Center/
ModelArts/
Troubleshooting/
Training Jobs/
Service Code Issues/
Training Job Failed with Error Code 139
Updated on 2024-04-30 GMT+08:00
Training Job Failed with Error Code 139
Symptom
The training job failed, and error code 139 is returned.
Possible Causes
The possible causes are as follows:
- Certain pip packages in the pip source have been updated, leading to data incompatibility. For example, an error occurs when the transformers package is imported after the package update.
- The user code has a bug, leading to memory overwriting or unauthorized memory access.
- An unknown system error occurs. In this case, create the training job again. If the fault persists, submit a service ticket.
Solution
- If the training job succeeded before and no modification has been made, compare the logs in the two cases and check whether any dependency package has been updated in the pip source.
Figure 1 Log comparison
- Use the local PyCharm to remotely access notebook for debugging.
- If the fault persists, contact technical support engineers.
Summary and Suggestions
Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.
- Use the online notebook environment for debugging. For details, see JupyterLab Overview and Common Operations.
- Use a local IDE (PyCharm or VS Code) to access the cloud environment for debugging. For details, see Operation Process in a Local IDE.
Parent topic: Service Code Issues
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot