Help Center/
ModelArts/
Troubleshooting/
Training Jobs/
Hard Faults Due to Space Limit/
Insufficient Container Space for Copying Data
Updated on 2024-04-11 GMT+08:00
Insufficient Container Space for Copying Data
Symptom
When a ModelArts training job was running, the error below was printed in the log. As a result, data failed to be copied to the container.
OSError:[Errno 28] No space left on device
Possible Causes
The container space is insufficient for downloading data.
Solution
- Check if data is downloaded to the /cache directory. Each GPU node has a /cache directory with 4 TB of storage. Check if the directory is experiencing an excessive creation of files simultaneously, which will run out of inodes, leading to a shortage of space.
- Check whether GPU resources are used. If CPU resources are used, /cache and the code directory share 10 GB of memory. As a result, the memory is insufficient. In this case, use GPU resources instead.
- Add the following environment variable to the code:
import os os.system('export TMPDIR=/cache')
Parent topic: Hard Faults Due to Space Limit
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot