Help Center/ ModelArts/ Troubleshooting/ Training Jobs/ Running a Training Job Failed/ Failed to Find the Boot File When a Training Job Is Created Using a Custom Image
Updated on 2024-04-11 GMT+08:00

Failed to Find the Boot File When a Training Job Is Created Using a Custom Image

Symptom

When a custom image is used to create a training job, error message "no such file or directory" is displayed.

Possible Causes

The directory of the boot file for running the command is incorrect.

Solution

Perform the following operations to check whether the boot file directory is correct:

When using a custom image to create a training job on ModelArts, set Algorithm Type to Custom algorithm and Boot Mode to Custom image.

If the OBS path to the boot script is obs://bucket-name/app/code/train.py, set the code directory to /bucket-name/app/code/ when creating a job. After the code directory is set, run the following command so that the selected code folder can be downloaded to the /home/ma-user/modelarts/user-job-dir directory of the training container:

bash /home/ma-user/modelarts/user-job-dir/run_train.sh  # Training command (using custom images)

Run the following command:

bash /home/ma-user/modelarts/user-job-dir/run_train.sh python /home/ma-user/modelarts/user-job-dir/code/train.py {python_file_parameter}  # Training command (using custom images)