Updated on 2023-12-14 GMT+08:00

Obtaining Training Job Versions

Function

This API is used to obtain the version of a specified training job based on the job ID.

URI

GET /v1/{project_id}/training-jobs/{job_id}/versions

Table 1 describes the required parameters.
Table 1 URI parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

job_id

Yes

Long

ID of a training job

Table 2 Query parameters

Parameter

Mandatory

Type

Description

per_page

No

Integer

Number of job parameters displayed on each page. The value range is [1, 1000]. Default value: 10

page

No

Integer

Index of the page to be queried

  • If paging is required, set page to 1.
  • The default value of page is 0, indicating that paging is not supported.

Request Body

None

Response Body

Table 3 describes the response parameters.
Table 3 Parameters

Parameter

Type

Description

is_success

Boolean

Whether the request is successful

error_message

String

Error message of a failed API call.

This parameter is not included when the API call succeeds.

error_code

String

Error code of a failed API call. For details, see Error Codes.

This parameter is not included when the API call succeeds.

job_id

Long

ID of a training job

job_name

String

Name of a training job

job_desc

String

Description of a training job

version_count

Long

Number of versions of a training job

versions

JSON Array

Version parameters of a training job. For details, see the sample response. For details about the attributes, see Table 4.

Table 4 versions parameters

Parameter

Type

Description

version_id

Long

Version ID of a training job

version_name

String

Version name of a training job

pre_version_id

Long

ID of the previous version of a training job

engine_type

Long

Engine type of a training job

engine_name

String

Name of the engine selected for a training job

engine_id

Long

ID of the engine selected for a training job

engine_version

String

Version of the engine selected for a training job

status

Int

Status of a training job

app_url

String

Code directory of a training job

boot_file_url

String

Boot file of a training job

create_time

Long

Time when a training job is created

parameter

JSON Array

Running parameters of a training job. This parameter is a container environment variable when a training job uses a custom image. For details, see Table 5.

duration

Long

Training job running duration, in milliseconds

spec_id

Long

ID of the resource specifications selected for a training job

core

String

Number of cores of the resource specifications

cpu

String

CPU memory of the resource specifications

gpu

Boolean

Whether to use GPUs

gpu_num

Integer

Number of GPUs of the resource specifications

gpu_type

String

GPU type of the resource specifications

worker_server_num

Integer

Number of workers in a training job

data_url

String

Dataset of a training job

train_url

String

OBS path of the training job output file

log_url

String

OBS URL of the logs of a training job. By default, this parameter is left blank. Example value: /usr/log/

dataset_version_id

String

Dataset version ID of a training job

dataset_id

String

Dataset ID of a training job

data_source

JSON Array

Dataset of a training job. For details, see Table 6.

model_id

Long

Model ID of a training job

model_metric_list

String

Model metrics of a training job. For details, see Table 7.

system_metric_list

String

System monitoring metrics of a training job. For details, see Table 8.

user_image_url

String

SWR URL of a custom image used by a training job

user_command

String

Boot command used to start the container of a custom image of a training job

resource_id

String

Charged resource ID of a training job

dataset_name

String

Dataset of a training job

start_time

Long

Training start time

volumes

JSON Array

Storage volume that can be used by a training job. For details, see Table 13.

dataset_version_name

String

Dataset of a training job

pool_name

String

Name of a resource pool

pool_id

String

ID of a resource pool

nas_mount_path

String

Local mount path of SFS Turbo (NAS). Example value: /home/work/nas

nas_share_addr

String

Shared path of SFS Turbo (NAS). Example value: 192.168.8.150:/

nas_type

String

Only NFS is supported. Example value: nfs

Table 5 parameter parameters

Parameter

Type

Description

label

String

Parameter name

value

String

Parameter value

Table 6 data_source parameters

Parameter

Type

Description

dataset_id

String

Dataset ID of a training job

dataset_version

String

Dataset version ID of a training job

type

String

Dataset type

  • obs: Data from OBS is used.
  • dataset: Data from a specified dataset is used.

data_url

String

OBS bucket path

Table 7 model_metric_list parameters

Parameter

Type

Description

metric

JSON Array

Validation metrics of a classification of a training job.

total_metric

JSON

Overall validation parameters of a training job. For details, see Table 11.

Table 8 system_metric_list parameters

Parameter

Type

Description

cpuUsage

Array

CPU usage of a training job

memUsage

Array

Memory usage of a training job

gpuUtil

Array

GPU usage of a training job

Table 9 metric parameters

Parameter

Type

Description

metric_values

JSON

Validation metrics of a classification of a training job. For details, see Table 10.

reserved_data

JSON

Reserved parameter

metric_meta

JSON

Classification of a training job, including the classification ID and name

Table 10 metric_values parameters

Parameter

Type

Description

recall

Float

Recall of a classification of a training job

precision

Float

Precision of a classification of a training job

accuracy

Float

Accuracy of a classification of a training job

Table 11 total_metric parameters

Parameter

Type

Description

total_metric_meta

JSON Array

Reserved parameter

total_reserved_data

JSON Array

Reserved parameter

total_metric_values

JSON Array

Overall validation metrics of a training job. For details, see Table 12.

Table 12 total_metric_values parameters

Parameter

Type

Description

f1_score

Float

F1 score of a training job. This parameter is used only by some preset algorithms and is automatically generated. It is for reference only.

recall

Float

Total recall of a training job

precision

Float

Total precision of a training job

accuracy

Float

Total accuracy of a training job

Table 13 volumes parameters

Parameter

Type

Description

nfs

object

Storage volume of the shared file system type. Only the training jobs running in a resource pool with the shared file system network connected support such storage volumes. For details, see Table 14.

host_path

object

Storage volume of the host file system type. Only training jobs running in a dedicated resource pool support such storage volumes. For details, see Table 15.

Table 14 nfs parameters

Parameter

Type

Description

id

String

ID of an SFS Turbo file system

src_path

String

Address of an SFS Turbo file system

dest_path

String

Local path to a training job

read_only

Boolean

Whether dest_path is read-only. The default value is false.

  • true: read-only permission
  • false: read/write permission. This is the default value.
Table 15 host_path parameters

Parameter

Type

Description

src_path

String

Local path to a host

dest_path

String

Local path to a training job

read_only

Boolean

Whether dest_path is read-only. The default value is false.

  • true: read-only permission
  • false: read/write permission. This is the default value.

Sample Request

The following shows how to obtain the job version details on the first page when job_id is set to 10 and five records are displayed on each page.

GET    https://endpoint/v1/{project_id}/training-jobs/10/versions?per_page=5&page=1

Sample Response

  • Successful response
    {
        "is_success": true,
        "job_id": 10,
        "job_name": "testModelArtsJob",
        "job_desc": "testModelArtsJob desc",
        "version_count": 2,
        "versions": [
            {
                "version_id": 10,
                "version_name": "V0004",
                "pre_version_id": 5,
                "engine_type": 1,
                "engine_name": "TensorFlow",
                "engine_id": 1,
                "engine_version": "TF-1.4.0-python2.7",
                "status": 10,
                "app_url": "/usr/app/",
                "boot_file_url": "/usr/app/boot.py",
                "create_time": 1524189990635,
                "parameter": [
                    {
                        "label": "learning_rate",
                        "value": 0.01
                    }
                ],
                "duration": 532003,
                "spec_id": 1,
                "core": 2,
                "cpu": 8,
                "gpu": true,
                "gpu_num": 2,
                "gpu_type": "P100",
                "worker_server_num": 1,
                "data_url": "/usr/data/",
                "train_url": "/usr/train/",
                "log_url": "/usr/log/",
                "dataset_version_id": "2ff0d6ba-c480-45ae-be41-09a8369bfc90",
                "dataset_id": "38277e62-9e59-48f4-8d89-c8cf41622c24",
                "data_source": [
                    {
                        "type": "obs",
                        "data_url": "/qianjiajun-test/minst/data/"
                    }
                ],
                "user_image_url": "100.125.5.235:20202/jobmng/custom-cpu-base:1.0",
                "user_command": "bash -x /home/work/run_train.sh python /home/work/user-job-dir/app/mnist/mnist_softmax.py --data_url /home/work/user-job-dir/app/mnist_data",
                "model_id": 1,
                "model_metric_list": "{\"metric\":[{\"metric_values\":{\"recall\":0.005833,\"precision\":0.000178,\"accuracy\":0.000937},\"reserved_data\":{},\"metric_meta\":{\"class_name\":0,\"class_id\":0}}],\"total_metric\":{\"total_metric_meta\":{},\"total_reserved_data\":{},\"total_metric_values\":{\"recall\":0.005833,\"id\":0,\"precision\":0.000178,\"accuracy\":0.000937}}}",
                "system_metric_list": "{\"cpuUsage\":[\"0\",\"3.10\",\"5.76\",\"0\",\"0\",\"0\",\"0\"],\"memUsage\":[\"0\",\"0.77\",\"2.09\",\"0\",\"0\",\"0\",\"0\"],\"gpuUtil\":[\"0\",\"0.25\",\"0.88\",\"0\",\"0\",\"0\",\"0\"],\"gpuMemUsage\":[\"0\",\"0.65\",\"6.01\",\"0\",\"0\",\"0\",\"0\"],\"diskReadRate\":[\"0\",\"91811.07\",\"38846.63\",\"0\",\"0\",\"0\",\"0\"],\"diskWriteRate\":[\"0\",\"2.23\",\"0.94\",\"0\",\"0\",\"0\",\"0\"],\"recvBytesRate\":[\"0\",\"5770405.50\",\"2980077.75\",\"0\",\"0\",\"0\",\"0\"],\"sendBytesRate\":[\"0\",\"12607.17\",\"10487410.00\",\"0\",\"0\",\"0\",\"0\"],\"interval\":1}",
                "dataset_name": "dataset-test",
                "dataset_version_name": "dataset-version-test",
    
                "start_time": 1563172362000,
                "volumes": [
                    {
                        "nfs": {
                            "id": "43b37236-9afa-4855-8174-32254b9562e7",
                            "src_path": "192.168.8.150:/",
                            "dest_path": "/home/work/nas",
                            "read_only": false
                        }
                    },
                    {
                        "host_path": {
                            "src_path": "/root/work",
                            "dest_path": "/home/mind",
                            "read_only": false
                        }
                    }
                ],
                "pool_id": "pool9928813f",
                "pool_name": "p100",
                "nas_mount_path": "/home/work/nas",
                "nas_share_addr": "192.168.8.150:/",
                "nas_type": "nfs"
            }
        ]
    }
  • Failed response
    {
        "is_success": false,
        "error_message": "Error string",
        "error_code": "ModelArts.0105"
    }

Status Code

For details about the status code, see Status Code.