Viewing Training Job Details
- Log in to the ModelArts console.
- In the navigation pane, choose Model Training > Training Jobs.
In the job list, click Export to export training job details in a certain time range as an Excel file. A maximum of 200 rows of data can be exported.
In the search box above the training job list, filter jobs by attributes like status, mode, type, or priority.
- In the training job list, click the target job name to switch to the training job details page.
- On the left of the training job details page, view basic job settings and algorithm parameters.
- Basic job settings
Table 1 Basic job settings Parameter
Description
Job ID
Unique ID of the training job.
Status
Status of the training job.
Created
Time when the training job is created.
Duration
Duration of a training job, which is the total duration of Kubernetes resources in the entire lifecycle of a training job.
Retries
Number of times that the training job automatically restarts upon a fault. This parameter is only available when Auto Restart is enabled during training job creation.
Number of restarts/Maximum number of restarts is displayed here.
Description
Description of the training job.
You can click the edit icon to update the description of a training job.
Job Priority
Priority of the training job.
Preemption
This parameter is only displayed when Preemption is enabled when you create the training job.
- Algorithm parameters
Table 2 Algorithm parameters Parameter
Description
Algorithm Name
Algorithm used in the training job. You can click the algorithm name to go to the algorithm details page.
Preset images
Preset image used by the training job This parameter is available only for training jobs created using a preset image.
Custom image
Custom image used by the training job. This parameter is available only for training jobs created using a custom image.
Code Directory
OBS path to the code directory of the training job.
You can click Edit Code on the right to edit the training script code in OBS Online Editor. OBS Online Editor is not available for a training job in the Pending, Creating, or Running status.
NOTE:This parameter is not supported when you use a subscribed algorithm to create a training job.
Boot File
Location where the training boot file is stored.
NOTE:This parameter is not supported when you use a subscribed algorithm to create a training job.
User ID
ID of the user who runs the container.
Local Code Directory
Path to the training code in the training container.
Work Directory
Path to the training boot file in the training container.
Compute Nodes
Number of instances for the training job.
Dedicated resource pool
Dedicated resource pool information. This parameter is available only when a training job uses a dedicated resource pool.
Compute Node ID
Names and IP addresses of the compute nodes used by the training job. This parameter is only displayed when the training job uses a dedicated resource pool.
Specifications
Instance specifications used by the training job. This parameter is only displayed when the training job does not use custom specifications of a dedicated resource pool.
This parameter shows the instance specifications for the training job, both allocated to the training containers and chosen during job creation. The actual resources used are usually less than those chosen at job creation. This happens because the job's internal containers use some resources. These containers help run the training job smoothly.
Customized Specifications
Instance specifications used by the training job. This parameter is only displayed when the training job uses custom specifications of a dedicated resource pool.
This parameter shows the custom resource and instance specifications selected for the training job.
Input > Input Path
OBS path where the input data is stored.
Input > Parameter Name
Input path parameter specified in the algorithm code.
Input > Obtained from
Method of obtaining the training job input.
Input > Local Path (Training Parameter Value)
Path for storing the input data in the ModelArts backend container. After the training is started, ModelArts downloads the data stored in OBS to the backend container.
Output > Output Path
OBS path where the output data is stored.
Output > Parameter Name
Output path parameter specified in the algorithm code.
Output > Obtained from
Method of obtaining the training job output.
Output > Local Path (Training Parameter Value)
Path for storing the output data in the ModelArts backend container.
Hyperparameter
Hyperparameters used in the training job.
Environment Variable
Environment variables for the training job.
- Basic job settings
- On the training details page, manage event notifications of the training job.
- Event notifications cannot be configured for training jobs in the Completed, Failed, Abnormal, or Terminated state.
- To set up event notifications, you need permission to view jobs.
- Only the updated training status is notified for modification events.
After event notification is enabled, you will be notified of a specific event, such as a job status change or suspected suspension, through an SMS message or email. Notifications will be billed based on SMN pricing. For details, see Billing.
- If event notification has been enabled for a training job, you can click
next to Enabled to modify or disable event notification.
Figure 1 Modifying event notification - If event notification has not been enabled for a training job, you can click
next to Disabled to enable event notification.
Figure 2 Configuring event notification
Table 3 Event notification parameters Parameter
Description
Topic
Topic name of event notification. You can select a topic from the drop-down list or click Create now to create a topic on the SMN console.
NOTE:You can create a topic on the SMN console, add a subscription to it, and confirm the subscription status. Once these steps are completed, you will be notified of the event.
Event
Select events you want to subscribe to. Examples: JobStarted, JobCompleted, JobFailed, JobTerminated, and JobHanged.
NOTE:Only training jobs using GPUs or NPUs support JobHanged events.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot