Help Center/ ModelArts/ SDK Reference/ Training Management/ Training Jobs/ Obtaining the Details About a Training Job
Updated on 2024-06-12 GMT+08:00

Obtaining the Details About a Training Job

Sample Code

In ModelArts notebook, you do not need to enter authentication parameters for session authentication. For details about session authentication of other development environments, see Session Authentication.

  • Method 1: Use the specified job_id.
    from modelarts.session import Session
    from modelarts.estimatorV2 import Estimator
    session = Session()
    estimator = Estimator(session=session, job_id="618222c4-dc2f-4cfe-bc49-72b075b7552f")
    job_info = estimator.get_job_info()
    print(job_info)
  • Method 2: Use the training job created in Creating a Training Job.
    job_info = job_instance.get_job_info()
    print(job_info)

Parameters

Table 1 Estimator request parameters

Parameter

Mandatory

Type

Description

session

Yes

Object

Session object. For details about the initialization method, see Session Authentication.

job_id

Yes

String

ID of a training job. You can obtain job_id using the training job created in Creating a Training Job, for example, job_instance.job_id, or from the response obtained in Obtaining Training Jobs.

Table 2 get_job_info response parameters

Parameter

Type

Description

kind

String

Training job type, which defaults to job.

Options:

  • job: training job
  • hetero_job: heterogeneous job
  • autosearch_job: auto search job
  • mrs_job: MRS job
  • edge_job: edge job

metadata

JobMetadata object

Metadata of a training job.

status

Status object

Status of a training job. When creating a training job, you do not need to set this parameter.

algorithm

JobAlgorithmResponse object

Algorithm used by a training job. The following formats are supported:

  • id: Only the algorithm ID is used.
  • subscription_id and item_version_id: The subscription ID and version ID of the algorithm are used.
  • code_dir and boot_file: The code directory and boot file of the training job are used.

tasks

Array of TaskResponse objects

Tasks of a heterogeneous training job.

spec

spec object

Specifications of a training job.

Table 3 JobMetadata

Parameter

Type

Description

id

String

Training job ID, which is generated and returned by ModelArts after a training job is created.

name

String

Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-).

workspace_id

String

Workspace where a training job is deployed. Default value: 0

description

String

Description of a training job, which defaults to NULL. The value must contain 0 to 256 characters.

create_time

Long

Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created.

user_name

String

Username for creating a training job. The username is generated and returned by ModelArts after a training job is created.

annotations

Map<String,String>

Declaration template of a training job. For heterogeneous jobs, the default value of job_template is Template RL. For other jobs, the default value is Template DL.

Table 4 Status

Parameter

Type

Description

phase

String

Level-1 status of a training job. The value will remain unchanged. Options: Creating, Pending, Running, Failed, Completed, Terminating, Terminated, and Abnormal

secondary_phase

String

Level-2 status of a training job. The value can be changed. Options: Creating, Queuing, Running, Failed, Completed, Terminating, Terminated, CreateFailed, TerminatedFailed, Unknown, and Lost

duration

Long

Running duration of a training job, in milliseconds

node_count_metrics

Array<Array<Integer>>

Node count changes during the runtime of a training job

tasks

Array of strings

Task of a training job

start_time

String

Start time of a training job. The value is in timestamp format.

task_statuses

Array of objects

Status of a training job task

Table 5 task_statuses

Parameter

Type

Description

task

String

Task of a training job

exit_code

Integer

Exit code of a training job task

message

String

Error message of a training job task

Table 6 JobAlgorithmResponse

Parameter

Type

Description

id

String

Algorithm ID

Options:

  • id: Only the algorithm ID is used.
  • subscription_id and item_version_id: The subscription ID and version ID of the algorithm are used.
  • code_dir and boot_file: The code directory and boot file of the training job are used.

name

String

Algorithm name

subscription_id

String

Subscription ID of the subscribed algorithm, which must be used with item_version_id

item_version_id

String

Version ID of the subscribed algorithm, which must be used with subscription_id

code_dir

String

Code directory of a training job, for example, /usr/app/. This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

boot_file

String

Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

autosearch_config_path

String

YAML configuration path of an auto search job. An OBS URL is required.

autosearch_framework_path

String

Framework code directory of an auto search job. An OBS URL is required.

command

String

Boot command for starting the container of the custom image used for creating a training job. The value of this parameter can be the same as the code_dir value.

parameters

Array of Parameter objects

Running parameters of a training job.

policies

policies object

Policies supported by a training job.

inputs

Array of Input objects

Input of a training job.

outputs

Array of Output objects

Output of a training job.

engine

engine object

Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id and item_version_id of the subscribed algorithm.

environments

Map<String,String>

Environment variables of a training job in the format of "key":"value". Leave this parameter blank.

Table 7 Parameter

Parameter

Type

Description

name

String

Parameter name

value

String

Parameter value

description

String

Parameter description

constraint

constraint object

Parameter constraint

i18n_description

i18n_description object

Internationalization description

Table 8 constraint

Parameter

Type

Description

type

String

Parameter type

editable

Boolean

Whether the parameter is editable

required

Boolean

Whether the parameter is mandatory

sensitive

Boolean

Whether the parameter is sensitive

valid_type

String

Valid type

valid_range

Array of strings

Valid range

Table 9 i18n_description

Parameter

Type

Description

language

String

Internationalization language

description

String

Description

Table 10 policies

Parameter

Type

Description

auto_search

auto_search object

Hyperparameter search configuration

Table 12 reward_attrs

Parameter

Type

Description

name

String

Metric name

mode

String

Search mode

  • max: A larger metric value is preferred.
  • min: A smaller metric value is preferred.

regex

String

Regular expression of a metric

Table 13 search_params

Parameter

Type

Description

name

String

Hyperparameter name

param_type

String

Parameter type

  • continuous: Parameter values are continuous.
  • discrete: Parameter values are discrete.

lower_bound

String

Lower bound of the hyperparameter

upper_bound

String

Upper bound of the hyperparameter

discrete_points_num

String

Number of discrete points of a hyperparameter with continuous values

discrete_values

Array of strings

Discrete hyperparameter values

Table 14 algo_configs

Parameter

Type

Description

name

String

Name of the search algorithm

params

Array of AutoSearchAlgoConfigParameter objects

Search algorithm parameters

Table 15 AutoSearchAlgoConfigParameter

Parameter

Type

Description

key

String

Parameter key

value

String

Parameter value

type

String

Parameter type

Table 16 Input

Parameter

Type

Description

name

String

Name of the data input channel

description

String

Description of the data input channel

local_dir

String

Local directory of the container to which the data input channel is mapped

remote

InputDataInfo object

Information of the data input

remote_constraint

Array of objects

Data input constraint

Table 17 InputDataInfo

Parameter

Type

Description

dataset

dataset object

Dataset as the data input

obs

obs object

OBS in which data input and output are stored

Table 18 dataset

Parameter

Type

Description

id

String

Dataset ID of a training job

version_id

String

Dataset version ID of a training job

obs_url

String

OBS URL of the dataset for a training job, which is automatically parsed by ModelArts based on the dataset ID and dataset version IDs, for example, /usr/data/

Table 19 obs

Parameter

Type

Description

obs_url

String

OBS URL of the dataset for a training job, for example, /usr/data/

Table 20 remote_constraint

Parameter

Type

Description

data_type

String

Data input type, including the data storage location and dataset

attributes

String

Attributes when a dataset functions as the data input

Options:

  • data_format: data format
  • data_segmentation: data segmentation
  • dataset_type: data labeling
Table 21 Output

Parameter

Type

Description

name

String

Name of the data output channel

description

String

Description of the data output channel

local_dir

String

Local directory of the container to which the data output channel is mapped

remote

remote object

Information of the data output

Table 22 remote

Parameter

Type

Description

obs

obs object

OBS to which data is exported

Table 23 obs

Parameter

Type

Description

obs_url

String

OBS URL to which data is exported

Table 24 engine

Parameter

Type

Description

engine_id

String

Engine ID selected for a training job, which can be engine_id, engine_name and engine_version, or image_url

engine_name

String

Name of the engine selected for a training job. Leave this parameter blank if engine_id is specified.

engine_version

String

Version of the engine selected for a training job. Leave this parameter blank if engine_id is specified.

image_url

String

Custom image URL selected for a training job

Table 25 TaskResponse

Parameter

Type

Description

role

String

Role of a heterogeneous training job task

Options:

  • learner: GPUs or CPUs are supported.
  • worker: CPUs are supported.

algorithm

algorithm object

Algorithm configurations in algorithm management

task_resource

FlavorResponse object

Flavors for a training job or an algorithm

Table 26 algorithm

Parameter

Type

Description

code_dir

String

Absolute path of the directory where the algorithm boot file is stored

boot_file

String

Absolute path of the algorithm boot file

inputs

inputs object

Algorithm input channel

outputs

outputs object

Algorithm output channel

engine

engine object

Engine on which a heterogeneous job depends

Table 27 inputs

Parameter

Type

Description

name

String

Name of the data input channel

local_dir

String

Local path of the container to which the data input and output channels are mapped

remote

remote object

Actual data input, which can only be OBS for heterogeneous jobs

Table 28 remote

Parameter

Type

Description

obs

obs object

OBS in which data input and output are stored

Table 29 obs

Parameter

Type

Description

obs_url

String

OBS URL of the dataset for a training job, for example, /usr/data/

Table 30 outputs

Parameter

Type

Description

name

String

Name of the data output channel

local_dir

String

Local directory of the container to which the data output channel is mapped

remote

remote object

Information of the data output

mode

String

Data transmission mode, which defaults to upload_periodically

period

String

Data transmission period, which defaults to 30s

Table 31 remote

Parameter

Type

Description

obs

obs object

OBS to which data is exported

Table 32 obs

Parameter

Type

Description

obs_url

String

OBS URL to which data is exported

Table 33 engine

Parameter

Type

Description

engine_id

String

Engine ID of a heterogeneous job, for example, caffe-1.0.0-python2.7

engine_name

String

Engine name of a heterogeneous job, for example, Caffe

engine_version

String

Engine version of a heterogeneous job

v1_compatible

Boolean

Whether v1 is compatible

run_user

String

User UID for which the engine is started by default

Table 34 FlavorResponse

Parameter

Type

Description

flavor_id

String

ID of the resource flavor

flavor_name

String

Name of the resource flavor

max_num

Integer

Maximum number of nodes with the resource flavor

flavor_type

String

Resource flavor type. Options:

  • CPU
  • GPU
  • Ascend

billing

billing object

Billing information of a resource flavor

flavor_info

flavor_info object

Resource flavor details

attributes

Map<String,String>

Other flavor attributes

Table 35 billing

Parameter

Type

Description

code

String

Billing code

unit_num

Integer

Number of billing units

Table 36 flavor_info

Parameter

Type

Description

max_num

Integer

Maximum number of nodes that can be selected. Value 1 indicates that the distributed mode is not supported.

cpu

cpu object

CPU specifications

gpu

gpu object

GPU specifications

npu

npu object

Ascend specifications

memory

memory object

Memory information

Table 37 cpu

Parameter

Type

Description

arch

String

CPU architecture

core_num

Integer

Number of cores

Table 38 gpu

Parameter

Type

Description

unit_num

Integer

Number of GPUs

product_name

String

Product name

memory

String

Memory

Table 39 npu

Parameter

Type

Description

unit_num

String

Number of NPUs

product_name

String

Product name

memory

String

Memory

Table 40 memory

Parameter

Type

Description

size

Integer

Memory size

unit

String

Number of memory units

Table 41 spec

Parameter

Type

Description

resource

Resource object

Resource flavors of a training job, which can either be flavor_id or pool_id and flavor_id

volumes

Array of objects

Volumes attached for a training job

log_export_path

log_export_path object

Export path of training job logs

Table 42 Resource

Parameter

Type

Description

policy

String

Resource flavor mode of a training job. Options: regular, economic, and turbo

flavor_id

String

Resource flavor ID of a training job

flavor_name

String

Read-only flavor name returned by ModelArts when flavor_id is specified

node_count

Integer

Number of resource replicas selected for a training job

Minimum value: 1

pool_id

String

Resource pool ID selected for a training job

flavor_detail

flavor_detail object

Flavors for a training job or an algorithm

Table 43 flavor_detail

Parameter

Type

Description

flavor_type

String

Resource flavor type. Options:

  • CPU
  • GPU
  • Ascend

billing

billing object

Billing information of a resource flavor

flavor_info

flavor_info object

Resource flavor details

Table 44 billing

Parameter

Type

Description

code

String

Billing code

unit_num

Integer

Number of billing units

Table 45 flavor_info

Parameter

Type

Description

max_num

Integer

Maximum number of nodes that can be selected. Value 1 indicates that the distributed mode is not supported.

cpu

cpu object

CPU specifications

gpu

gpu object

GPU specifications

npu

npu object

Ascend specifications

memory

memory object

Memory information

disk

disk object

Disk information

Table 46 cpu

Parameter

Type

Description

arch

String

CPU architecture

core_num

Integer

Number of cores

Table 47 gpu

Parameter

Type

Description

unit_num

Integer

Number of GPUs

product_name

String

Product name

memory

String

Memory

Table 48 npu

Parameter

Type

Description

unit_num

String

Number of NPUs

product_name

String

Product name

memory

String

Memory

Table 49 memory

Parameter

Type

Description

size

Integer

Memory size

unit

String

Number of memory units

Table 50 disk

Parameter

Type

Description

size

String

Disk size

unit

String

Unit of the disk size, which is GB generally

Table 51 volumes

Parameter

Type

Description

nfs

nfs object

Disks attached in NFS mode

Table 52 nfs

Parameter

Type

Description

nfs_server_path

String

NFS server path

local_path

String

Path for attaching disks to the training container

read_only

Boolean

Whether the disks attached to the container in NFS mode are read-only

Table 53 log_export_path

Parameter

Type

Description

obs_url

String

OBS URL for storing training job logs

host_path

String

Path of the host where training job logs are stored

Table 54 Response for the failure to call a training API

Parameter

Type

Description

error_msg

String

Error message when calling an API failed. This parameter is unavailable if an API is successfully called.

error_code

String

Error code when calling an API failed. For details, see "Error Codes" in ModelArts API Reference. This parameter is unavailable if an API is successfully called.

error_solution

String

Solution to an API calling failure. This parameter is unavailable if an API is successfully called.