Updated on 2024-10-29 GMT+08:00

Training Job Rescheduling

When a training job fault occurs (such as process-level recovery, POD-level rescheduling, and job-level rescheduling), the Fault Recovery Details tab appears on the job details page, recording the start and stop details of the training job.

  1. On the ModelArts console, choose Model Training > Training Jobs from the navigation pane.
  2. In the training job list, click the name of the target job to go to the training job details page.
  3. On the training job details page, click the Fault Recovery Details tab to view the fault recovery information.
    Figure 1 Viewing fault recovery details