Help Center> ModelArts> Troubleshooting> Training Jobs> In-Cloud Migration Adaptation Issues> Error Message "No such file or directory" Displayed in Training Job Logs
Updated on 2024-04-30 GMT+08:00

Error Message "No such file or directory" Displayed in Training Job Logs

Symptom

If a training job failed, error message "No such file or directory" is displayed in logs.

If a training input path is unreachable, error message "No such file or directory" is displayed.

If a training boot file is unavailable, error message "No such file or directory" is displayed.

Figure 1 Example log for an unavailable training boot file

Possible Causes

Checking Whether the Affected Path Is an OBS Path

When using ModelArts, store data in an OBS bucket. However, the OBS path cannot be used to read data during the execution of the training code.

The reason is as follows:

After a training job is created, the training performance is poor if the running container is directly connected to OBS. To prevent this issue, the system automatically downloads the training data to the local path of the running container. Therefore, an error occurs if an OBS path is used in training code. For example, if the OBS path to the training code is obs://bucket-A/training/, the training code will be automatically downloaded to ${MA_JOB_DIR}/training/.

For example, the OBS path to the training code is obs://bucket-A/XXX/{training-project}/, where {training-project} is the name of the folder where the training code is stored. During training, the system will automatically download the data from OBS {training-project} to the local path of the training container ($MA_JOB_DIR/{training-project}/).

If the affected path is to the training data, perform the following operations to resolve this issue (see Parsing Input and Output Paths for details):

  1. When creating an algorithm, set the code path parameter, which defaults to data_url, in the input path mapping configuration.
  2. Add a hyperparameter, which defaults to data_url, to the training code. Use data_url as the local path for inputting the training data.

Checking Whether the Affected Path Is Available

The code developed locally needs to be uploaded to the ModelArts backend. It is likely to incorrectly set the path to a dependency file in training code.

You are suggested to use the following general solution to obtain the absolute path to a dependency file through the OS API.

Example:

|---project_root                # Root directory for code
   |---BootfileDirectory        # Directory where the boot file is located
     |---bootfile.py            # Boot file
   |---otherfileDirectory       # Directory where other dependency files are located
     |---otherfile.py           # Other dependency files
    

Do as follows to obtain the path to a dependency file, otherfile_path in this example, in the boot file:

import os
current_path = os.path.dirname(os.path.realpath(__file__)) # Directory where the boot file is located
project_root = os.path.dirname(current_path) # Root directory of the project, which is the code directory set on the ModelArts training console
otherfile_path = os.path.join(project_root, "otherfileDirectory", "otherfile.py")

Checking the File Boot Path of a Training Job Created Using a Custom Image

Take OBS path obs://obs-bucket/training-test/demo-code as an example. The training code in this path will be automatically downloaded to ${MA_JOB_DIR}/demo-code in the training container, where demo-code is the last-level directory of the OBS path and can be customized.

If you use a custom image to create a training job, the system will automatically run the image boot command after the code directory is downloaded. The boot command must comply with the following rules:

  • If the training startup script is a .py file, train.py for example, the boot command can be python ${MA_JOB_DIR}/demo-code/train.py.
  • If the training startup script is an .sh file, main.sh for example, the boot command can be bash ${MA_JOB_DIR}/demo-code/main.sh,

where demo-code is the last-level directory of the OBS path and can be customized.

Summary and Suggestions

Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.