Updated on 2025-08-18 GMT+08:00

Viewing Training Job Details

  1. Log in to the ModelArts console.
  2. In the navigation pane, choose Model Training > Training Jobs.

    In the job list, click Export to export training job details in a certain time range as an Excel file. A maximum of 200 rows of data can be exported.

    In the search box above the training job list, filter jobs by attributes like status, mode, type, or priority.

  3. In the training job list, click the target job name to switch to the training job details page.
  4. On the left of the training job details page, view basic job settings and algorithm parameters.
    • Basic job settings
      Table 1 Basic job settings

      Parameter

      Description

      Job ID

      Unique ID of the training job.

      Status

      Status of the training job.

      Created

      Time when the training job is created.

      Duration

      Duration of a training job, which is the total duration of Kubernetes resources in the entire lifecycle of a training job.

      Retries

      Number of times that the training job automatically restarts upon a fault. This parameter is only available when Auto Restart is enabled during training job creation.

      Number of restarts/Maximum number of restarts is displayed here.

      Description

      Description of the training job.

      You can click the edit icon to update the description of a training job.

      Job Priority

      Priority of the training job.

      Preemption

      This parameter is only displayed when Preemption is enabled when you create the training job.

    • Algorithm parameters
      Table 2 Algorithm parameters

      Parameter

      Description

      Algorithm Name

      Algorithm used in the training job. You can click the algorithm name to go to the algorithm details page.

      Preset images

      Preset image used by the training job This parameter is available only for training jobs created using a preset image.

      Custom image

      Custom image used by the training job. This parameter is available only for training jobs created using a custom image.

      Code Directory

      OBS path to the code directory of the training job.

      You can click Edit Code on the right to edit the training script code in OBS Online Editor. OBS Online Editor is not available for a training job in the Pending, Creating, or Running status.

      NOTE:

      This parameter is not supported when you use a subscribed algorithm to create a training job.

      Boot File

      Location where the training boot file is stored.

      NOTE:

      This parameter is not supported when you use a subscribed algorithm to create a training job.

      User ID

      ID of the user who runs the container.

      Local Code Directory

      Path to the training code in the training container.

      Work Directory

      Path to the training boot file in the training container.

      Compute Nodes

      Number of instances for the training job.

      Dedicated resource pool

      Dedicated resource pool information. This parameter is available only when a training job uses a dedicated resource pool.

      Compute Node ID

      Names and IP addresses of the compute nodes used by the training job. This parameter is only displayed when the training job uses a dedicated resource pool.

      Specifications

      Instance specifications used by the training job. This parameter is only displayed when the training job does not use custom specifications of a dedicated resource pool.

      This parameter shows the instance specifications for the training job, both allocated to the training containers and chosen during job creation. The actual resources used are usually less than those chosen at job creation. This happens because the job's internal containers use some resources. These containers help run the training job smoothly.

      Customized Specifications

      Instance specifications used by the training job. This parameter is only displayed when the training job uses custom specifications of a dedicated resource pool.

      This parameter shows the custom resource and instance specifications selected for the training job.

      Input > Input Path

      OBS path where the input data is stored.

      Input > Parameter Name

      Input path parameter specified in the algorithm code.

      Input > Obtained from

      Method of obtaining the training job input.

      Input > Local Path (Training Parameter Value)

      Path for storing the input data in the ModelArts backend container. After the training is started, ModelArts downloads the data stored in OBS to the backend container.

      Output > Output Path

      OBS path where the output data is stored.

      Output > Parameter Name

      Output path parameter specified in the algorithm code.

      Output > Obtained from

      Method of obtaining the training job output.

      Output > Local Path (Training Parameter Value)

      Path for storing the output data in the ModelArts backend container.

      Hyperparameter

      Hyperparameters used in the training job.

      Environment Variable

      Environment variables for the training job.

  5. On the training details page, manage event notifications of the training job.
    • Event notifications cannot be configured for training jobs in the Completed, Failed, Abnormal, or Terminated state.
    • To set up event notifications, you need permission to view jobs.
    • Only the updated training status is notified for modification events.

    After event notification is enabled, you will be notified of a specific event, such as a job status change or suspected suspension, through an SMS message or email. Notifications will be billed based on SMN pricing. For details, see Billing.

    • If event notification has been enabled for a training job, you can click next to Enabled to modify or disable event notification.
      Figure 1 Modifying event notification
    • If event notification has not been enabled for a training job, you can click next to Disabled to enable event notification.
      Figure 2 Configuring event notification
    Table 3 Event notification parameters

    Parameter

    Description

    Topic

    Topic name of event notification. You can select a topic from the drop-down list or click Create now to create a topic on the SMN console.

    NOTE:

    You can create a topic on the SMN console, add a subscription to it, and confirm the subscription status. Once these steps are completed, you will be notified of the event.

    Event

    Select events you want to subscribe to. Examples: JobStarted, JobCompleted, JobFailed, JobTerminated, and JobHanged.

    NOTE:

    Only training jobs using GPUs or NPUs support JobHanged events.