Help Center/ ModelArts/ Troubleshooting/ Training Jobs/ Hard Faults Due to Space Limit/ Error Message "write line error" Displayed in Logs
Updated on 2024-04-30 GMT+08:00

Error Message "write line error" Displayed in Logs

Symptom

During program running, a large number of error messages "write line error" are generated. This issue recurs each time the program runs at a specific progress.

Figure 1 Error log

Possible Causes

The possible causes are as follows:

  • Core files are generated during the program running and exhaust the storage space in the / root directory.
  • The 3.5 TB of storage space in the /cache directory is used up by the local data and files stored in it.

The disk space for in-cloud training consists of the space from the following directories:

  1. The / root directory, which is specified by base size in Docker. The default value is 10 GB. On the cloud, the value has been changed to 50 GB.
  2. The /cache directory, which is 3.5 TB typically.

Solution

  1. If core files are generated in the training job's work directory, add the code below at the beginning of the boot script to disable the generation of the core files.
    import os
    os.system("ulimit -c 0")
  2. Check whether the dataset and checkpoint file have used up the storage space of the /cache directory.
  3. Use the local PyCharm to remotely access notebook for debugging.

Summary and Suggestions

Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.