Help Center/ ModelArts/ API Reference/ Training Management/ Querying the Running Metrics of a Specified Task in a Training Job
Updated on 2024-05-30 GMT+08:00

Querying the Running Metrics of a Specified Task in a Training Job

Function

This API is used to query the running metrics of a specified task in a training job.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

GET /v2/{project_id}/training-jobs/{training_job_id}/metrics/{task_id}

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details, see Obtaining a Project ID and Name.

training_job_id

Yes

String

ID of a training job.

task_id

Yes

String

Name of a training job. You can obtain the value from the status.tasks field in the training job details.

Request Parameters

None

Response Parameters

Status code: 200

Table 2 Response body parameters

Parameter

Type

Description

metrics

Array of metrics objects

Running metrics.

Table 3 metrics

Parameter

Type

Description

metric

String

Running metric. The options are as follows:

  • cpuUsage: CPU usage

  • memUsage: indicates the physical memory usage.

  • gpuUtil: GPU usage

  • gpuMemUsage: GPU memory usage

  • npuUtil: NPU usage

  • npuMemUsage: NPU GPU memory usage

value

Array of doubles

Value of a running metric. An average value is collected every minute.

Example Requests

The following shows how to query the running metrics of the work-0 task of the training job whose UUID is 2cd88daa-31a4-40a8-a58f-d186b0e93e4f.

GET https://endpoint/v2/{project_id}/training-jobs/2cd88daa-31a4-40a8-a58f-d186b0e93e4f/metrics/worker-0

Example Responses

Status code: 200

ok

{
  "metrics" : [ {
    "metric" : "cpuUsage",
    "value" : [ -1, -1, 2.43, 4.524, 6.714, 12.422, 9.214, 5.36, 7.5, 10.088, 8.975, 11.423, 11.548, 14.563, 16.833 ]
  }, {
    "metric" : "memUsage",
    "value" : [ -1, -1, 0.04, 0.521, 1.652, 4.252, 6.433, 7.384, 7.982, 8.718, 9.365, 9.881, 10.192, 9.994, 9.005 ]
  }, {
    "metric" : "gpuUtil",
    "value" : [ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
  }, {
    "metric" : "gpuMemUsage",
    "value" : [ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
  }, {
    "metric" : "npuUtil",
    "value" : [ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
  }, {
    "metric" : "npuMemUsage",
    "value" : [ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
  } ]
}

Status Codes

Status Code

Description

200

ok

Error Codes

See Error Codes.