Help Center/ ModelArts/ Troubleshooting/ Training Jobs/ Hard Faults Due to Space Limit/ Error Message "No space left on device" Is Displayed in Logs
Updated on 2025-06-06 GMT+08:00

Error Message "No space left on device" Is Displayed in Logs

Symptom

When data, code, or a model is copied during training, the following error message is displayed.

Figure 1 Error log

Possible Causes

  • The disk space is insufficient.
  • In distributed jobs, the docker base size configuration is not effective on all nodes. Sometimes, the storage space in the container's root directory / defaults to 10 GB instead of the required 50 GB, causing training failures.
  • If there are many files in the same directory, the kernel creates an index table for faster file retrieval. Rapidly creating numerous files can hit the index limit, resulting in the error.

    Factors affecting this include:

    • Longer file names
    • Smaller block sizes (There are three block sizes, 1024 bytes, 2048 bytes, and 4096 bytes. The default size is 4096 bytes.)
    • More rapid file creation

Solution

  1. Rectify the issue by referring to Error Message "write line error" Is Displayed in Logs.
  2. If the problem persists on specific nodes, submit a service ticket to isolate those nodes.
  3. For EulerOS restrictions, do as follows:
    • Reduce files in a single directory.
    • Slow down file creation.
    • Disable the dir_index attribute of the Ext4 file system, which may affect the file retrieval performance. For details, see https://access.redhat.com/solutions/29894.

Summary and Suggestions

Before creating a training job, use the ModelArts development environment to debug your training code and minimize migration errors.