Error Message "write line error" Displayed in Logs
Symptom
During program running, a large number of error messages "write line error" are generated. This issue recurs each time the program runs at a specific progress.
Possible Causes
The possible causes are as follows:
- Core files are generated during the program running and exhaust the storage space in the / root directory.
- The 3.5 TB of storage space in the /cache directory is used up by the local data and files stored in it.
The disk space for in-cloud training consists of the space from the following directories:
- The / root directory, which is specified by base size in Docker. The default value is 10 GB. On the cloud, the value has been changed to 50 GB.
- The /cache directory, which is 3.5 TB typically.
Solution
- If the issue is caused by core files, add the following code at the very beginning of the boot script to disable the generation of the core files:
import os os.system("ulimit -c 0")
- Check whether the dataset and checkpoint file have used up the storage space of the /cache directory.
- Use the local PyCharm to remotely access notebook for debugging.
Summary and Suggestions
Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.