Help Center/ ModelArts/ Troubleshooting/ Training Jobs/ In-Cloud Migration Adaptation Issues/ Failed to Find the Boot File When a Training Job Is Created Using a Custom Image
Updated on 2024-06-11 GMT+08:00

Failed to Find the Boot File When a Training Job Is Created Using a Custom Image

Symptom

When a custom image is used to create a training job of the old version, error message "No such file or directory" is displayed.

Possible Causes

The directory of the boot file for running the command is incorrect.

Solution

Perform the following operations to check whether the boot file directory is correct:

On the ModelArts management console, select Custom for Algorithm Source when creating a training job using a custom image.

If the OBS path to the boot script is obs://bucket-name/app/code/train.py, set the code directory to /bucket-name/app/code/ when creating a job.

After the code directory is configured, run the following command to download the selected code folder to the /home/work/user-job-dir directory of the old-version training container:

bash /home/work/run_train.sh  # Training command of the old version. run_train.sh is the training boot script, which is packed in the base image provided by ModelArts.

Run the following command:

bash /home/work/run_train.sh python /home/work/user-job-dir/code/train.py {python_file_parameter}  # Training jobs of the old version