Help Center/ ModelArts/ SDK Reference/ Training Management (New Version)/ Training Jobs/ Obtaining the Runtime Metrics of a Training Job
Updated on 2022-11-24 GMT+08:00

Obtaining the Runtime Metrics of a Training Job

Sample Code

In ModelArts notebook, you do not need to enter authentication parameters for session authentication. For details about session authentication of other development environments, see Session Authentication.

  • Method 1: Use the specified job_id.
from modelarts.session import Session
from modelarts.estimatorV2 import Estimator
session = Session()
estimator = Estimator(session=session, job_id="your job id")
info = estimator.get_job_metrics()
print(info)
info = job_instance.get_job_metrics(task_id="worker-0")
print(info)

Parameters

Table 1 Parameters for initializing the Estimator

Parameter

Mandatory

Type

Description

session

Yes

Object

Session object. For details about the initialization method, see Session Authentication.

job_id

Yes

String

ID of a training job. You can obtain job_id using the training job created in Creating a Training Job, for example, job_instance.job_id, or from the response obtained in Obtaining Training Jobs.

Table 2 get_job_log request parameters

Parameter

Mandatory

Type

Description

task_id

No

String

ID of a worker node for obtaining logs. It defaults to worker-0. If train_instance_count is set to 2 when you create a training job, the value of this parameter can be worker-0 or worker-1.

Table 3 Response parameters

Parameter

Type

Description

metrics

Array of objects

Runtime metrics

Table 4 metrics

Parameter

Type

Description

metric

String

Runtime metric. The value can be cpuUsage (CPU usage), memUsage (physical memory usage), gpuUtil (GPU usage), gpuMemUsage (GPU memory usage), npuUtil (NPU usage), or npuMemUsage (NPU memory usage).

value

Array of numbers

Value of a runtime metric. An average value is collected every minute.

Table 5 Response for the failure to call a training API

Parameter

Type

Description

error_msg

String

Error message when calling an API failed. This parameter is unavailable if an API is successfully called.

error_code

String

Error code when calling an API failed. For details, see Error Codes. This parameter is unavailable if an API is successfully called.

error_solution

String

Solution to an API calling failure. This parameter is unavailable if an API is successfully called.