Help Center/ ModelArts/ API Reference/ Training Management/ Querying the Running Metrics of a Specified Task in a Training Job
Updated on 2025-08-20 GMT+08:00

Querying the Running Metrics of a Specified Task in a Training Job

Function

This API is used to query the running metrics of a specified task in a training job on ModelArts.

This API applies to the following scenario: When you need to view the performance metrics of a specified task in a training job, you can call this API to obtain the running metrics. Before using this API, ensure that you have obtained the training job ID and task ID and have the permission to view running metrics. After the query is complete, the platform returns the performance metrics of the task. If the training job ID or task ID does not exist, no task metric is generated, or you do not have the operation permission, the API will return an error message.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

GET /v2/{project_id}/training-jobs/{training_job_id}/metrics/{task_id}

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition: Project ID. For details, see Obtaining a Project ID and Name.

Constraints: The value can contain 1 to 64 characters. Letters, digits, and hyphens (-) are allowed.

Range: N/A

Default Value: N/A

training_job_id

Yes

String

Definition: ID of a training job

Constraints: For details, see Querying a Training Job List.

Range: N/A

Default Value: N/A

task_id

Yes

String

Definition: Name of a training job. You can obtain the value from the status.tasks field in the training job details.

Constraints: For one node, the default is worker-0. For multiple nodes, it includes worker-0, worker-1, and so on.

Range: N/A

Default Value: N/A

Request Parameters

None

Response Parameters

Status code: 200

Table 2 Response body parameters

Parameter

Type

Description

metrics

Array of MetricObject objects

Definition: Running metrics.

Table 3 MetricObject

Parameter

Type

Description

metric

String

Definition: Running metrics.

Range:

  • cpuUsage: CPU usage

  • memUsage: physical memory usage

  • gpuUtil: GPU usage

  • gpuMemUsage: GPU memory usage

  • npuUtil: NPU usage

  • npuMemUsage: NPU memory usage

value

Array of doubles

Definition: Value of a running metric. An average value is collected every minute.

Example Requests

The following shows how to query the running metrics of the work-0 task of the training job whose UUID is 2cd88daa-31a4-40a8-a58f-d186b0e93e4f.

GET https://endpoint/v2/{project_id}/training-jobs/2cd88daa-31a4-40a8-a58f-d186b0e93e4f/metrics/worker-0

Example Responses

Status code: 200

ok

{
  "metrics" : [ {
    "metric" : "cpuUsage",
    "value" : [ -1, -1, 2.43, 4.524, 6.714, 12.422, 9.214, 5.36, 7.5, 10.088, 8.975, 11.423, 11.548, 14.563, 16.833 ]
  }, {
    "metric" : "memUsage",
    "value" : [ -1, -1, 0.04, 0.521, 1.652, 4.252, 6.433, 7.384, 7.982, 8.718, 9.365, 9.881, 10.192, 9.994, 9.005 ]
  }, {
    "metric" : "gpuUtil",
    "value" : [ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
  }, {
    "metric" : "gpuMemUsage",
    "value" : [ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
  }, {
    "metric" : "npuUtil",
    "value" : [ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
  }, {
    "metric" : "npuMemUsage",
    "value" : [ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
  } ]
}

Status Codes

Status Code

Description

200

ok

Error Codes

See Error Codes.