Error Message "write line error" Displayed in Logs
Symptom
During program running, a large number of error messages "write line error" are generated. This issue recurs each time the program runs at a specific progress.
Possible Causes
The possible causes are as follows:
- Core files are generated during the program running and exhaust the storage space in the / root directory.
- The 3.5 TB of storage space in the /cache directory is used up by the local data and files stored in it.
The disk space for in-cloud training consists of the space from the following directories:
- The / root directory, which is specified by base size in Docker. The default value is 10 GB. On the cloud, the value has been changed to 50 GB.
- The /cache directory, which is 3.5 TB typically.
Solution
- If core files are generated in the training job's work directory, add the code below at the beginning of the boot script to disable the generation of the core files.
import os os.system("ulimit -c 0")
- Check whether the dataset and checkpoint file have used up the storage space of the /cache directory.
- Use the local PyCharm to remotely access notebook for debugging.
Summary and Suggestions
- Use the online notebook environment for debugging. For details, see JupyterLab Overview and Common Operations.
- Use a local IDE (PyCharm or VS Code) to access the cloud environment for debugging. For details, see Operation Process in a Local IDE.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot