Updated on 2025-11-18 GMT+08:00

Viewing Training Job Details

  1. Log in to the ModelArts console.
  2. In the navigation pane, choose Model Training > Training Jobs.

    In the job list, click Export to export training job details in a certain time range as an Excel file. A maximum of 200 rows of data can be exported.

    In the search box above the training job list, filter jobs by attributes like status, mode, type, or priority.

  3. In the training job list, click the target job name to switch to the training job details page.
  4. On the left of the training job details page, view basic job settings and algorithm parameters.
    • Basic job settings
      Table 1 Basic job settings

      Parameter

      Description

      Job ID

      Unique ID of the training job.

      Status

      • Status of the training job.
      • The value can be Completed, Pending, Running, Creating, Terminating, Terminated, Failed, Abnormal, or Deleting.

      Experiment

      • Name of the experiment to which the training job belongs.
      • Click to go to the experiment's job list.

      Created

      Time when the training job is created.

      Duration

      Duration of a training job, which is the total duration of Kubernetes resources in the entire lifecycle of a training job.

      Retries

      • Number of times that the training job automatically restarts upon a fault. This parameter is only available when Auto Restart is enabled during training job creation.
      • Number of restarts/Maximum number of restarts is displayed here.

      Unconditional Auto Restart

      It is displayed after auto restart is enabled.

      When Unconditional Auto Restart is enabled during job creation, Open is displayed.

      If it is not configured or is not enabled, Disabled is displayed.

      Restart Upon Suspension

      It is displayed after auto restart is enabled.

      When Restart Upon Suspension is enabled during job creation, Open is displayed.

      If it is not configured or is not enabled, Disabled is displayed.

      Description

      Description of the training job.

      When left unset, -- appears. Click to edit the training job's description.

      Job Priority

      • Priority of a training job created using dedicated resource pool. If a training job is created using a public resource pool, this parameter is not displayed.
      • The platform handles jobs by prioritizing them from highest to lowest. If multiple jobs share the same priority, they are scheduled in the order they were submitted. When resources are available, the earliest-submitted job gets processed first.
      • The priority can be set to 1, 2, or 3. A larger number indicates a higher priority. The default priority is 1, and the highest priority is 3.
      • If a training job is in the Pending state for a long time, you can change the job priority to reduce the queuing duration. For details, see Priority of a Training Job.

      Preemption

      • When using a dedicated resource pool, you can set this parameter. This parameter is not displayed when a public resource pool is used.
      • When enabled, jobs that allow preemption may be terminated and re-queued if resource pool capacity is insufficient. To avoid losing training progress, configure resumable training before enabling this function. For details, see Resumable Training.
      • Disabled is displayed when it is not set.
    • Algorithm parameters
      Table 2 Algorithm parameters

      Parameter

      Description

      Runtime Type

      Job mode, which can be Debug or Production.

      Preset images

      Preset image used by the training job This parameter is available only for training jobs created using a preset image.

      Custom image

      Custom image used by the training job. This parameter is available only for training jobs created using a custom image.

      Code Directory

      OBS path to the code directory of the training job.

      You can click Edit Code on the right to edit the training script code in OBS Online Editor. OBS Online Editor is not available for a training job in the Pending, Creating, or Running status.

      NOTE:

      This parameter is not supported when you use a subscribed algorithm to create a training job.

      Boot File

      Location where the training boot file is stored.

      NOTE:

      This parameter is not supported when you use a subscribed algorithm to create a training job.

      Boot Command

      Command for booting an image. This parameter appears only when Boot Mode is set to Custom image, not for Preset image. You can view both the parameter and its value on the training job details page.

      User ID

      ID of the user who runs the container.

      Algorithm Name

      • Algorithm used in the training job. You can click the algorithm name to go to the algorithm details page.
      • If this parameter is not configured, -- appears.

      Local Code Directory

      Path to the training code in the training container.

      Work Directory

      Path to the training boot file in the training container.

      Compute Nodes

      Number of instances for the training job.

      Dedicated resource pool

      Dedicated resource pool information. This parameter is available only when a training job uses a dedicated resource pool.

      Compute Node

      Names and IP addresses of the compute nodes used by the training job. This parameter is only displayed when the training job uses a dedicated resource pool.

      Specifications

      • Instance specifications used by the training job. This parameter is only displayed when the training job does not use custom specifications of a dedicated resource pool.
      • This parameter shows the instance specifications for the training job, both allocated to the training containers and chosen during job creation. The actual resources used are usually less than those chosen at job creation. This happens because the job's internal containers use some resources. These containers help run the training job smoothly.

      Customized Specifications

      • Instance specifications used by the training job. This parameter is only displayed when the training job uses custom specifications of a dedicated resource pool.
      • This parameter shows the custom resource and instance specifications selected for the training job.

      Job Log Path

      • This parameter is displayed if you select Persistent Log Saving and configure Job Log Path when creating the training job. This parameter is not displayed if Persistent Log Saving is not selected.
      • Click the path to go to the directory where the configured path is located.

      Event Notification

      • Topic and events set for event notification during training job creation.
      • Disabled is displayed when it is not configured.

      Input > Input Path

      OBS path where the input data is stored.

      Input > Parameter Name

      Input path parameter specified in the algorithm code.

      Input > Obtained from

      Method of obtaining the training job input.

      Input > Container Path

      Path for storing the input data in the ModelArts backend container. After the training is started, ModelArts downloads the data stored in OBS to the backend container.

      Output > Output Path

      OBS path where the output data is stored.

      Output > Parameter Name

      Output path parameter specified in the algorithm code.

      Output > Obtained from

      Method of obtaining the training job output.

      Output > Container Path

      Path for storing the output data in the ModelArts backend container.

      Hyperparameter

      Hyperparameters used in the training job.

      Environment Variable

      Environment variables for the training job.

  5. On the training details page, manage event notifications of the training job.
    • Event notifications cannot be configured for training jobs in the Completed, Failed, Abnormal, or Terminated state.
    • To set up event notifications, you need permission to view jobs.
    • Only the updated training status is notified for modification events.

    After event notification is enabled, you will be notified of a specific event, such as a job status change or suspected suspension, through an SMS message or email. Notifications will be billed based on SMN pricing. For details, see Billing.

    • If event notification has been enabled for a training job, you can click next to Enabled to modify or disable event notification.
      Figure 1 Modifying event notification
    • If event notification has not been enabled for a training job, you can click next to Disabled to enable event notification.
      Figure 2 Configuring event notification
    Table 3 Event notification parameters

    Parameter

    Description

    Topic

    Topic name of event notification. You can select a topic from the drop-down list or click Create now to create a topic on the SMN console.

    NOTE:

    You can create a topic on the SMN console, add a subscription to it, and confirm the subscription status. Once these steps are completed, you will be notified of the event.

    Event

    Select events you want to subscribe to. Examples: JobStarted, JobCompleted, JobFailed, JobTerminated, and JobHanged.

    NOTE:

    Only training jobs using GPUs or NPUs support JobHanged events.