Updated on 2024-06-12 GMT+08:00

Obtaining Training Jobs

Sample Code

In ModelArts notebook, you do not need to enter authentication parameters for session authentication. For details about session authentication of other development environments, see Session Authentication.

from modelarts.session import Session
from modelarts.estimatorV2 import Estimator
session = Session()
job_list = Estimator.get_job_list(session=session, offset=10, limit=5, sort_by="create_time", order="asc", 
                                  filters=[{"key": "name", "operator": "like", "value": ["trainjob"]}])

print(job_list)

Parameters

Table 1 get_job_list request parameters

Parameter

Mandatory

Type

Description

session

Yes

Object

Session object. For details about the initialization method, see Session Authentication.

offset

No

Integer

Offset for obtaining training jobs. The minimum value is 0. For example, if this parameter is set to 1, the query starts from the second one.

limit

No

Integer

Maximum number of training jobs to be obtained. The value ranges from 1 to 50.

sort_by

No

String

Metric for sorting obtained training jobs. By default, training jobs are sorted by creation time (create_time).

order

No

String

Order of obtained training jobs. The default value is desc, indicating the descending order. You can also set this parameter to asc, indicating the ascending order.

Default value: desc

Options:

  • asc: The query results are displayed in ascending order.
  • desc: The query results are displayed in descending order.

group_by

No

String

Condition for grouping the obtained training jobs.

filters

No

Array of objects

Filter criteria for obtaining training jobs.

Table 2 filters

Parameter

Mandatory

Type

Description

key

No

String

Key of the grouping condition.

operator

No

String

The key-value relationship of a grouping condition.

Default value: in

Options:

  • like: similar
  • in: included
  • not: not included
  • between: a range

value

No

Array of strings

Value of the grouping condition key.

Table 3 get_job_list response parameters

Parameter

Type

Description

total

Integer

Total number of training jobs of the current user.

count

Integer

Total number of training jobs that meet the search criteria of the current user.

limit

Integer

Maximum number of training jobs to be obtained. The value ranges from 1 to 50.

offset

Integer

Offset for obtaining training jobs. The minimum value is 0. For example, if this parameter is set to 1, the query starts from the second one.

sort_by

String

Metric for sorting obtained training jobs. By default, training jobs are sorted by creation time (create_time).

order

String

Order of obtained training jobs. The default value is desc, indicating the descending order. You can also set this parameter to asc, indicating the ascending order.

group_by

String

Condition for grouping the obtained training jobs.

workspace_id

String

Workspace where a training job is deployed. The default value is 0.

ai_project

String

AI project to which a training job belongs. The default value is default-ai-project.

items

Array of JobResponse objects

Details of the training jobs that meet the search criteria of the current user.

Table 4 JobResponse

Parameter

Type

Description

kind

String

Training job type, which defaults to job.

Options:

  • job: training job
  • hetero_job: heterogeneous job
  • autosearch_job: auto search job
  • mrs_job: MRS job
  • edge_job: edge job

metadata

JobMetadata object

Metadata of a training job.

status

Status object

Status of a training job. When creating a training job, you do not need to set this parameter.

algorithm

JobAlgorithmResponse object

Algorithm used by a training job. The following formats are supported:

  • id: Only the algorithm ID is used.
  • subscription_id and item_version_id: The subscription ID and version ID of the algorithm are used.
  • code_dir and boot_file: The code directory and boot file of the training job are used.

tasks

Array of TaskResponse objects

Tasks of a heterogeneous training job.

spec

spec object

Specifications of a training job.

Table 5 JobMetadata

Parameter

Type

Description

id

String

Training job ID, which is generated and returned by ModelArts after a training job is created.

name

String

Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-).

workspace_id

String

Workspace where a training job is deployed. Default value: 0

description

String

Description of a training job, which defaults to NULL. The value must contain 0 to 256 characters.

create_time

Long

Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created.

user_name

String

Username for creating a training job. The username is generated and returned by ModelArts after a training job is created.

annotations

Map<String,String>

Declaration template of a training job. For heterogeneous jobs, the default value of job_template is Template RL. For other jobs, the default value is Template DL.

Table 6 Status

Parameter

Type

Description

phase

String

Level-1 status of a training job. The value will remain unchanged. Options: Creating, Pending, Running, Failed, Completed, Terminating, Terminated, and Abnormal

secondary_phase

String

Level-2 status of a training job. The value can be changed. Options: Creating, Queuing, Running, Failed, Completed, Terminating, Terminated, CreateFailed, TerminatedFailed, Unknown, and Lost

duration

Long

Running duration of a training job, in milliseconds

node_count_metrics

Array<Array<Integer>>

Node count changes during the runtime of a training job

tasks

Array of strings

Tasks of a training job

start_time

String

Start time of a training job. The value is in timestamp format.

task_statuses

Array of objects

Status of a training job task

Table 7 task_statuses

Parameter

Type

Description

task

String

Task of a training job

exit_code

Integer

Exit code of a training job task

message

String

Error message of a training job task

Table 8 JobAlgorithmResponse

Parameter

Type

Description

id

String

Algorithm ID

Options:

  • id: Only the algorithm ID is used.
  • subscription_id and item_version_id: The subscription ID and version ID of the algorithm are used.
  • code_dir and boot_file: The code directory and boot file of the training job are used.

name

String

Algorithm name

subscription_id

String

Subscription ID of the subscribed algorithm, which must be used with item_version_id

item_version_id

String

Version ID of the subscribed algorithm, which must be used with subscription_id

code_dir

String

Code directory of a training job, for example, /usr/app/. This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

boot_file

String

Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

autosearch_config_path

String

YAML configuration path of an auto search job. An OBS URL is required.

autosearch_framework_path

String

Framework code directory of an auto search job. An OBS URL is required.

command

String

Boot command for starting the container of the custom image used for creating a training job. The value of this parameter can be the same as the code_dir value.

parameters

Array of Parameter objects

Running parameters of a training job.

policies

policies object

Policies supported by a training job.

inputs

Array of Input objects

Input of a training job.

outputs

Array of Output objects

Output of a training job.

engine

engine object

Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id and item_version_id of the subscribed algorithm.

environments

Map<String,String>

Environment variables of a training job in the format of "key":"value". Leave this parameter blank.

Table 9 Parameter

Parameter

Type

Description

name

String

Parameter name

value

String

Parameter value

description

String

Parameter description

constraint

constraint object

Parameter constraint

i18n_description

i18n_description object

Internationalization description

Table 10 constraint

Parameter

Type

Description

type

String

Parameter type

editable

Boolean

Whether the parameter is editable

required

Boolean

Whether the parameter is mandatory

sensitive

Boolean

Whether the parameter is sensitive

valid_type

String

Valid type

valid_range

Array of strings

Valid range

Table 11 i18n_description

Parameter

Type

Description

language

String

Internationalization language

description

String

Description

Table 12 policies

Parameter

Type

Description

auto_search

auto_search object

Hyperparameter search configuration

Table 14 reward_attrs

Parameter

Type

Description

name

String

Metric name

mode

String

Search mode

  • max: A larger metric value is preferred.
  • min: A smaller metric value is preferred.

regex

String

Regular expression of a metric

Table 15 search_params

Parameter

Type

Description

name

String

Hyperparameter name

param_type

String

Parameter type

  • continuous: Parameter values are continuous.
  • discrete: Parameter values are discrete.

lower_bound

String

Lower bound of the hyperparameter

upper_bound

String

Upper bound of the hyperparameter

discrete_points_num

String

Number of discrete points of a hyperparameter with continuous values

discrete_values

Array of strings

Discrete hyperparameter values

Table 16 algo_configs

Parameter

Type

Description

name

String

Name of the search algorithm

params

Array of AutoSearchAlgoConfigParameter objects

Search algorithm parameters

Table 17 AutoSearchAlgoConfigParameter

Parameter

Type

Description

key

String

Parameter key

value

String

Parameter value

type

String

Parameter type

Table 18 Input

Parameter

Type

Description

name

String

Name of the data input channel

description

String

Description of the data input channel

local_dir

String

Local directory of the container to which the data input channel is mapped

remote

InputDataInfo object

Information of the data input

remote_constraint

Array of objects

Data input constraint

Table 19 InputDataInfo

Parameter

Type

Description

dataset

dataset object

Dataset as the data input

obs

obs object

OBS in which data input and output are stored

Table 20 dataset

Parameter

Type

Description

id

String

Dataset ID of a training job

version_id

String

Dataset version ID of a training job

obs_url

String

OBS URL of the dataset for a training job, which is automatically parsed by ModelArts based on the dataset ID and dataset version IDs, for example, /usr/data/

Table 21 obs

Parameter

Type

Description

obs_url

String

OBS URL of the dataset for a training job, for example, /usr/data/

Table 22 remote_constraint

Parameter

Type

Description

data_type

String

Data input type, including the data storage location and dataset

attributes

String

Attributes when a dataset functions as the data input

Options:

  • data_format: data format
  • data_segmentation: data segmentation
  • dataset_type: data labeling
Table 23 Output

Parameter

Type

Description

name

String

Name of the data output channel

description

String

Description of the data output channel

local_dir

String

Local directory of the container to which the data output channel is mapped

remote

remote object

Information of the data output

Table 24 remote

Parameter

Type

Description

obs

obs object

OBS to which data is exported

Table 25 obs

Parameter

Type

Description

obs_url

String

OBS URL to which data is exported

Table 26 engine

Parameter

Type

Description

engine_id

String

Engine ID selected for a training job, which can be engine_id, engine_name and engine_version, or image_url

engine_name

String

Name of the engine selected for a training job. Leave this parameter blank if engine_id is specified.

engine_version

String

Version of the engine selected for a training job. Leave this parameter blank if engine_id is specified.

image_url

String

Custom image URL selected for a training job

Table 27 TaskResponse

Parameter

Type

Description

role

String

Role of a heterogeneous training job task

Options:

  • learner: GPUs or CPUs are supported.
  • worker: CPUs are supported.

algorithm

algorithm object

Algorithm configurations in algorithm management

task_resource

FlavorResponse object

Flavors for a training job or an algorithm

Table 28 algorithm

Parameter

Type

Description

code_dir

String

Absolute path of the directory where the algorithm boot file is stored

boot_file

String

Absolute path of the algorithm boot file

inputs

inputs object

Algorithm input channel

outputs

outputs object

Algorithm output channel

engine

engine object

Engine on which a heterogeneous job depends

Table 29 inputs

Parameter

Type

Description

name

String

Name of the data input channel

local_dir

String

Local path of the container to which the data input and output channels are mapped

remote

remote object

Actual data input, which can only be OBS for heterogeneous jobs

Table 30 remote

Parameter

Type

Description

obs

obs object

OBS in which data input and output are stored

Table 31 obs

Parameter

Type

Description

obs_url

String

OBS URL of the dataset for a training job, for example, /usr/data/

Table 32 outputs

Parameter

Type

Description

name

String

Name of the data output channel

local_dir

String

Local directory of the container to which the data output channel is mapped

remote

remote object

Information of the data output

mode

String

Data transmission mode, which defaults to upload_periodically

period

String

Data transmission period, which defaults to 30s

Table 33 remote

Parameter

Type

Description

obs

obs object

OBS to which data is exported

Table 34 obs

Parameter

Type

Description

obs_url

String

OBS URL to which data is exported

Table 35 engine

Parameter

Type

Description

engine_id

String

Engine ID of a heterogeneous job, for example, caffe-1.0.0-python2.7

engine_name

String

Engine name of a heterogeneous job, for example, Caffe

engine_version

String

Engine version of a heterogeneous job

v1_compatible

Boolean

Whether v1 is compatible

run_user

String

User UID for which the engine is started by default

Table 36 FlavorResponse

Parameter

Type

Description

flavor_id

String

ID of the resource flavor

flavor_name

String

Name of the resource flavor

max_num

Integer

Maximum number of nodes with the resource flavor

flavor_type

String

Resource flavor type. Options:

  • CPU
  • GPU
  • Ascend

billing

billing object

Billing information of a resource flavor

flavor_info

flavor_info object

Resource flavor details

attributes

Map<String,String>

Other flavor attributes

Table 37 billing

Parameter

Type

Description

code

String

Billing code

unit_num

Integer

Number of billing units

Table 38 flavor_info

Parameter

Type

Description

max_num

Integer

Maximum number of nodes that can be selected. Value 1 indicates that the distributed mode is not supported.

cpu

cpu object

CPU specifications

gpu

gpu object

GPU specifications

npu

npu object

Ascend specifications

memory

memory object

Memory information

Table 39 cpu

Parameter

Type

Description

arch

String

CPU architecture

core_num

Integer

Number of cores

Table 40 gpu

Parameter

Type

Description

unit_num

Integer

Number of GPUs

product_name

String

Product name

memory

String

Memory

Table 41 npu

Parameter

Type

Description

unit_num

String

Number of NPUs

product_name

String

Product name

memory

String

Memory

Table 42 memory

Parameter

Type

Description

size

Integer

Memory size

unit

String

Number of memory units

Table 43 spec

Parameter

Type

Description

resource

Resource object

Resource flavors of a training job, which can either be flavor_id or pool_id and flavor_id

volumes

Array of objects

Volumes attached for a training job

log_export_path

log_export_path object

Export path of training job logs

Table 44 Resource

Parameter

Type

Description

policy

String

Resource flavor mode of a training job. Options: regular, economic, and turbo

flavor_id

String

Resource flavor ID of a training job

flavor_name

String

Read-only flavor name returned by ModelArts when flavor_id is specified

node_count

Integer

Number of resource replicas selected for a training job

Minimum value: 1

pool_id

String

Resource pool ID selected for a training job

flavor_detail

flavor_detail object

Flavors for a training job or an algorithm

Table 45 flavor_detail

Parameter

Type

Description

flavor_type

String

Resource flavor type. Options:

  • CPU
  • GPU
  • Ascend

billing

billing object

Billing information of a resource flavor

flavor_info

flavor_info object

Resource flavor details

Table 46 billing

Parameter

Type

Description

code

String

Billing code

unit_num

Integer

Number of billing units

Table 47 flavor_info

Parameter

Type

Description

max_num

Integer

Maximum number of nodes that can be selected. Value 1 indicates that the distributed mode is not supported.

cpu

cpu object

CPU specifications

gpu

gpu object

GPU specifications

npu

npu object

Ascend specifications

memory

memory object

Memory information

disk

disk object

Disk information

Table 48 cpu

Parameter

Type

Description

arch

String

CPU architecture

core_num

Integer

Number of cores

Table 49 gpu

Parameter

Type

Description

unit_num

Integer

Number of GPUs

product_name

String

Product name

memory

String

Memory

Table 50 npu

Parameter

Type

Description

unit_num

String

Number of NPUs

product_name

String

Product name

memory

String

Memory

Table 51 memory

Parameter

Type

Description

size

Integer

Memory size

unit

String

Number of memory units

Table 52 disk

Parameter

Type

Description

size

String

Disk size

unit

String

Unit of the disk size, which is GB generally

Table 53 volumes

Parameter

Type

Description

nfs

nfs object

Disks attached in NFS mode

Table 54 nfs

Parameter

Type

Description

nfs_server_path

String

NFS server path

local_path

String

Path for attaching disks to the training container

read_only

Boolean

Whether the disks attached to the container in NFS mode are read-only

Table 55 log_export_path

Parameter

Type

Description

obs_url

String

OBS URL for storing training job logs

host_path

String

Path of the host where training job logs are stored

Table 56 Response for the failure to call a training API

Parameter

Type

Description

error_msg

String

Error message when calling an API failed. This parameter is unavailable if an API is successfully called.

error_code

String

Error code when calling an API failed. For details, see "Error Codes" in ModelArts API Reference. This parameter is unavailable if an API is successfully called.

error_solution

String

Solution to an API calling failure. This parameter is unavailable if an API is successfully called.

Table 57 Response for the failure to call a training API

Parameter

Type

Description

error_msg

String

Error message when calling an API failed. This parameter is unavailable if an API is successfully called.

error_code

String

Error code when calling an API failed. For details, see "Error Codes" in ModelArts API Reference. This parameter is unavailable if an API is successfully called.

error_solution

String

Solution to an API calling failure. This parameter is unavailable if an API is successfully called.