Help Center/ ModelArts/ API Reference/ Training Management/ Creating a Training Job

Updated on 2025-12-05 GMT+08:00

View PDF

Creating a Training Job

Function

This API is used to create a training job on ModelArts.

This API applies to the following scenarios: When you need to perform machine learning training based on specific datasets and algorithm models, you can use this API to create and configure a training job. Before using this API, ensure that you have uploaded datasets and model code to ModelArts and have the permission to create training jobs. After a training job is created, the platform starts the training job based on the configured resource specifications. You can monitor the training progress and status by using the job ID. If the dataset or model code does not exist, the resource specifications are incorrectly configured, or you do not have the required permission, the API will return an error message.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer. Obtain its CLI example hcloud ModelArts CreateTrainingJob.

Authorization Information

Each account has all the permissions required to call all APIs, but IAM users must be assigned the required permissions.

If you are using role/policy-based authorization, see Permissions Policies and Supported Actions for details on the required permissions.

If you are using identity policy-based authorization, the following identity policy-based permissions are required.

Action	Access Level	Resource Type (*: required)	Condition Key	Alias	Dependencies
modelarts:trainJob:logExport	Write	trainJob *	-	-	-

URI

POST /v2/{project_id}/training-jobs

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
project_id	Yes	String	Definition: Project ID. For details, see Obtaining a Project ID and Name. Constraints: The value can contain 1 to 64 characters. Letters, digits, and hyphens (-) are allowed. Range: N/A Default Value: N/A

Request Parameters

**Table 2** Request body parameters
Parameter	Mandatory	Type	Description
kind	Yes	String	Definition: Type of a training job. Constraints: N/A Range job: common job federated_pool_job: resource pool federated job edge_job: edge job hetero_job: heterogeneous job mrs_job: MRS job autosearch_job: auto search job diag_job: diagnosis job visualization_job: visualization job Default Value: job
metadata	Yes	JobMetadata object	Definition: Training job metadata. Constraints: N/A
algorithm	No	JobAlgorithm object	Definition: Training job algorithm. Constraints: The options are as follows. id: Only the algorithm ID is used. subscription_id+item_version_id: The subscription ID and version ID of the algorithm are used. code_dir+boot_file: The code directory and boot file of the training job are used.
tasks	No	Array of Task objects	Definition: Task list. This function is not implemented. Constraints: N/A
spec	No	Spec object	Definition: Training job specifications. If this parameter is specified, leave the tasks parameter blank. Constraints: N/A
endpoints	No	JobEndpointsReq object	Definition: Configurations required for remotely accessing a training job. Constraints: N/A

**Table 3** JobMetadata
Parameter	Mandatory	Type	Description
name	Yes	String	Definition: Name of a training job. Constraints: N/A Range: The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). Default Value: N/A
workspace_id	No	String	Definition: Workspace where a specified job is located. Constraints: N/A Range: N/A Default Value: 0
description	No	String	Definition: Definition of a training job. Constraints: The value must contain 0 to 256 characters. Range: N/A Default Value: NULL
annotations	No	Map<String,String>	Definition: Advanced functions of a training job. Constraints: The value can be: job_template: Template RL (heterogeneous job) fault-tolerance/job-retry-num: 3 (number of retries upon a fault) fault-tolerance/job-unconditional-retry: true (unconditional restart) fault-tolerance/hang-retry: true (restart upon suspension) jupyter-lab/enable: true (JupyterLab training application) tensorboard/enable: true (TensorBoard training application) mindstudio-insight/enable: true (MindStudio Insight training application) fault-tolerance/hccl_op_retry: true (operator retry)

**Table 4** JobAlgorithm
Parameter	Mandatory	Type	Description
id	No	String	Definition: Algorithm ID in algorithm management. Constraints: N/A Range: N/A Default Value: N/A
name	No	String	Definition: Algorithm name. Leave it blank. Constraints: N/A Range: N/A Default Value: N/A
subscription_id	No	String	Definition: Subscription ID of a subscription algorithm. Constraints: This parameter must be used with item_version_id. Range: N/A Default Value: N/A
item_version_id	No	String	Definition: Version of a subscription algorithm. Constraints: This parameter must be used with subscription_id. Range: N/A Default Value: N/A
code_dir	No	String	Definition: Code directory of a training job, for example, /usr/app/. Constraints: This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified. Range: N/A Default Value: N/A
boot_file	No	String	Definition: Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. Constraints: This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified. Range: N/A Default Value: N/A
autosearch_config_path	No	String	Definition: YAML configuration path of an auto search job. An OBS URL is required. Constraints: N/A Range: N/A Default Value: N/A
autosearch_framework_path	No	String	Definition: Framework code directory of an auto search job. An OBS URL is required. Constraints: N/A Range: N/A Default Value: N/A
command	No	String	Definition: Command for starting the custom image container of a training job. Constraints: N/A Range: N/A Default Value: N/A
parameters	No	Array of Parameters objects	Definition: Running parameters of the training job. Constraints: N/A
policies	No	JobPolicies object	Definition: Policies supported by jobs, which are used for hyperparameter search. Constraints: N/A
inputs	No	Array of Input objects	Definition: Data input of a training job. Constraints: N/A
outputs	No	Array of Output objects	Definition: Output of the training job. Constraints: N/A
engine	No	JobEngine object	Definition: Engine of a training job. Constraints: Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm.
local_code_dir	No	String	Definition: Local directory of the training container to which the algorithm code directory is downloaded. Constraints The directory must be under /home. In v1 compatibility mode, the current field does not take effect. When code_dir is prefixed with file://, the current field does not take effect. The directory cannot be set to /home/ma-user/modelarts, /home/ma-user/modelarts-dev, /home/ma-user/infer, or their subdirectories, and cannot be set to /home/ma-user. Range: N/A Default Value: N/A
working_dir	No	String	Definition: Work directory where an algorithm is executed. Constraints: In v1 compatibility mode, the current field does not take effect. Range: N/A Default Value: N/A
environments	No	Map<String,String>	Definition: Environment variables of a training job. Format: "key":"value" Constraints: The key can contain a maximum of 8,192 characters, and the value can contain a maximum of 4,096 characters. A maximum of 100 key-value pairs are allowed. The variable name can contain only letters, digits, and underscores (), and must start with a letter or underscore (). Note: Variables cannot contain $.
summary	No	Summary object	Definition: Visualization log summary. Constraints: N/A

**Table 5** Parameters
Parameter	Mandatory	Type	Description
name	No	String	Definition: Parameter name. Constraints: N/A Range: N/A Default Value: N/A
value	No	String	Definition: Parameter value. Constraints: N/A Range: N/A Default Value: N/A
description	No	String	Definition: Parameter description. Constraints: N/A Range: N/A Default Value: N/A
constraint	No	ParametersConstraint object	Definition: Parameter attribute. Constraints: N/A
i18n_description	No	I18nDescription object	Definition: Internationalization description. Constraints: N/A

**Table 6** ParametersConstraint
Parameter	Mandatory	Type	Description
type	No	String	Definition: Parameter type. Constraints: N/A Range: N/A Default Value: N/A
editable	No	Boolean	Definition: Whether the parameter can be edited. Constraints: N/A Range: true: editable false: Not uneditable Default Value: N/A
required	No	Boolean	Definition: Whether the parameter is mandatory. Constraints: N/A Range: true: mandatory false: optional Default Value: N/A
sensitive	No	Boolean	Definition: Whether the parameter is sensitive. This function is unavailable currently. Constraints: N/A Range: true: sensitive false: insensitive Default Value: N/A
valid_type	No	String	Definition: Valid type. Constraints: N/A Range: N/A Default Value: N/A
valid_range	No	Array of strings	Definition: Valid range. Constraints: N/A

**Table 7** I18nDescription
Parameter	Mandatory	Type	Description
language	No	String	Definition: Internationalization language. Constraints: N/A Range: N/A Default Value: N/A
description	No	String	Definition: Description. Constraints: N/A Range: N/A Default Value: N/A

**Table 8** JobPolicies
Parameter	Mandatory	Type	Description
auto_search	No	AutoSearch object	Definition: Hyperparameter search configuration. Constraints: N/A

**Table 9** AutoSearch
Parameter	Mandatory	Type	Description
skip_search_params	No	String	Definition: Hyperparameter parameters that need to be skipped. Constraints: N/A Range: N/A Default Value: N/A
reward_attrs	No	Array of RewardAttrs objects	Definition: Search metrics. Constraints: N/A
search_params	No	Array of SearchParams objects	Definition: Search parameters. Constraints: N/A
algo_configs	No	Array of AlgoConfigs objects	Definition: Search algorithm configurations. Constraints: N/A

**Table 10** RewardAttrs
Parameter	Mandatory	Type	Description
name	No	String	Definition: Metric name. Constraints: N/A Range: N/A Default Value: N/A
mode	No	String	Definition: Search mode. Constraints: N/A Range: max: A larger metric value is preferred. min: A smaller metric value is preferred. Default Value: N/A
regex	No	String	Definition: Regular expression of a metric. Constraints: N/A Range: N/A Default Value: N/A

**Table 11** SearchParams
Parameter	Mandatory	Type	Description
name	No	String	Definition: Hyperparameter name. Constraints: N/A Range: N/A Default Value: N/A
param_type	No	String	Definition: Parameter type. Constraints: N/A Range: continuous: The hyperparameter is of the continuous type. When an algorithm is used in a training job, continuous hyperparameters are displayed as text boxes on the console. discrete: The hyperparameter is of the discrete type. When an algorithm is used in a training job, discrete hyperparameters are displayed as drop-down lists on the console. Default Value: N/A
lower_bound	No	String	Definition: Lower bound of the hyperparameter. Constraints: N/A Range: N/A Default Value: N/A
upper_bound	No	String	Definition: Upper bound of the hyperparameter. Constraints: N/A Range: N/A Default Value: N/A
discrete_points_num	No	String	Definition: Number of discrete points of a hyperparameter with continuous values. Constraints: N/A Range: N/A Default Value: N/A
discrete_values	No	Array of strings	Definition: Discrete hyperparameter values. Constraints: N/A

**Table 12** AlgoConfigs
Parameter	Mandatory	Type	Description
name	No	String	Definition: Search algorithm name. Constraints: N/A Range: N/A Default Value: N/A
params	No	Array of AutoSearchAlgoConfigParameter objects	Definition: Search algorithm parameters. Constraints: N/A

**Table 13** AutoSearchAlgoConfigParameter
Parameter	Mandatory	Type	Description
key	No	String	Definition: Parameter key. Constraints: N/A Range: N/A Default Value: N/A
value	No	String	Definition: Parameter value. Constraints: N/A Range: N/A Default Value: N/A
type	No	String	Definition: parameter type. Constraints: N/A Range: N/A Default Value: N/A

**Table 14** JobEngine
Parameter	Mandatory	Type	Description
engine_id	No	String	Definition: Engine ID selected for a training job. Constraints: The value can be engine_id, engine_name + engine_version, or image_url. Range: N/A Default Value: N/A
engine_name	No	String	Definition: Engine name selected for a training job. Constraints: If engine_id has been set, you do not need to set this parameter. If you use a preset framework and custom image to create a training job, you must set both this parameter and image_url. Range: N/A Default Value: N/A
engine_version	No	String	Definition: Engine version selected for a training job. Constraints: If engine_id has been set, you do not need to set this parameter. Range: N/A Default Value: N/A
image_url	No	String	Definition: Custom image URL selected for a training job. The URL is obtained from SWR. Constraints: The format is organization_name/image_name:tag. Range: N/A Default Value: N/A
install_sys_packages	No	Boolean	Definition: Specifies whether to install the MoXing version specified by the training platform. Constraints: This parameter is available only when engine_name, engine_version, and image_url are set. Range: true: yes false: no Default Value: N/A

**Table 15** Summary
Parameter	Mandatory	Type	Description
log_type	No	String	Definition: Visualization log type of a training job. After this parameter is configured, the training job can be used as the data source of a visualization job. Constraints: N/A Range: tensorboard: TensorBoard mindstudio-insight: MindStudio Insight Default Value: N/A
log_dir	No	LogDir object	Definition: Visualization log output of a training job. Constraints: This parameter is mandatory when log_type is not left empty.
data_sources	No	Array of DataSource objects	Definition: Visualization log input of the visualization job or training job debugging mode. Constraints: This parameter is mandatory when the advanced function "tensorboard/enable": "true" or "mindstudio-insight/enable": "true" is enabled for the training job.

**Table 16** LogDir
Parameter	Mandatory	Type	Description
pfs	Yes	PFSSummary object	Definition: Output of an OBS parallel file system. Constraints: N/A

**Table 17** PFSSummary
Parameter	Mandatory	Type	Description
pfs_path	Yes	String	Definition: URL of the OBS parallel file system. Constraints: N/A Range: N/A Default Value: N/A

**Table 18** DataSource
Parameter	Mandatory	Type	Description
job	Yes	JobSummary object	Definition: Job data source. Constraints: N/A

**Table 19** JobSummary
Parameter	Mandatory	Type	Description
job_id	Yes	String	Definition: ID of a training job. Constraints: N/A Range: N/A Default Value: N/A

**Table 20** Task
Parameter	Mandatory	Type	Description
role	No	String	Definition: Task role. This function is not supported currently. Constraints: N/A Range: N/A Default Value: N/A
algorithm	No	algorithm object	Definition: Algorithm configurations for algorithm management. Constraints: N/A
task_resource	No	task_resource object	Definition: Resource flavor of a training job. Constraints: N/A
log_export_path	No	log_export_path object	Definition: Saved information about training job logs. Constraints: N/A

**Table 21** algorithm
Parameter	Mandatory	Type	Description
job_config	No	job_config object	Definition: Algorithm configuration, such as the boot file. Constraints: N/A
code_dir	No	String	Definition: Algorithm code directory, for example, /usr/app/. Constraints: This parameter must be used with boot_file. Range: N/A Default Value: N/A
boot_file	No	String	Definition: Code boot file of the algorithm, which must be stored in the code directory, for example, /usr/app/boot.py. Constraints: This parameter must be used with code_dir. Range: N/A Default Value: N/A
engine	No	engine object	Definition: Algorithm engine of a heterogeneous job. Constraints: N/A
inputs	No	Array of inputs objects	Definition: Data input of an algorithm. Constraints: N/A
outputs	No	Array of outputs objects	Definition: Data output of an algorithm. Constraints: N/A
local_code_dir	No	String	Definition: Local directory of the training container to which the algorithm code directory is downloaded. Constraints: The directory must be under /home. In v1 compatibility mode, the current field does not take effect. When code_dir is prefixed with file://, the current field does not take effect. Range: N/A Default Value: N/A
working_dir	No	String	Definition: Work directory where an algorithm is executed. Constraints: In v1 compatibility mode, the current field does not take effect. Range: N/A Default Value: N/A
environments	No	Map<String,String>	Definition: Environment variables of a training job. Constraints: N/A Range: N/A Default Value: N/A

**Table 22** job_config
Parameter	Mandatory	Type	Description
parameters	No	Array of Parameter objects	Definition: Running parameters of an algorithm. Constraints: N/A
inputs	No	Array of Input objects	Definition: Data input of an algorithm. Constraints: N/A
outputs	No	Array of Output objects	Definition: Data output of an algorithm. Constraints: N/A
engine	No	engine object	Definition: Algorithm engine. Constraints: N/A

**Table 23** Parameter
Parameter	Mandatory	Type	Description
name	No	String	Definition: Parameter name. Constraints: N/A Range: N/A Default Value: N/A
value	No	String	Definition: Parameter value. Constraints: N/A Range: N/A Default Value: N/A
description	No	String	Definition: Parameter description. Constraints: N/A Range: N/A Default Value: N/A
constraint	No	constraint object	Definition: Parameter attribute. Constraints: N/A
i18n_description	No	i18n_description object	Definition: Internationalization description. Constraints: N/A

**Table 24** constraint
Parameter	Mandatory	Type	Description
type	No	String	Definition: Parameter type. Constraints: N/A Range: N/A Default Value: N/A
editable	No	Boolean	Definition: Whether the parameter can be edited. Constraints: N/A Range: true: editable false: Not uneditable Default Value: N/A
required	No	Boolean	Definition: Whether the parameter is mandatory. Constraints: N/A Range: true: mandatory false: optional Default Value: N/A
sensitive	No	Boolean	Definition: Whether the parameter is sensitive. Constraints: This function is unavailable currently. Range: true: sensitive false: insensitive Default Value: N/A
valid_type	No	String	Definition: Valid type. Constraints: N/A Range: N/A Default Value: N/A
valid_range	No	Array of strings	Definition: Valid range. Constraints: N/A

**Table 25** i18n_description
Parameter	Mandatory	Type	Description
language	No	String	Definition: Internationalization language. Constraints: N/A Range: N/A Default Value: N/A
description	No	String	Definition: Internationalization language description. Constraints: N/A Range: N/A Default Value: N/A

**Table 26** Input
Parameter	Mandatory	Type	Description
name	Yes	String	Definition: Name of the data input channel. Constraints: N/A Range: N/A Default Value: N/A
description	No	String	Definition: Description of the data input channel. Constraints: N/A Range: N/A Default Value: N/A
local_dir	No	String	Definition: Local path of the container to which the data input channels are mapped. Example: /home/ma-user/modelarts/inputs/data_url_0 Constraints: N/A Range: N/A Default Value: N/A
access_method	No	String	Definition: Access method of the input data channel path (local_dir). Constraints: N/A Range: parameter: hyperparameters env: environment variables Default Value: parameter
remote	Yes	InputDataInfo object	Definition: Description of the actual data input. Constraints: The options are as follows. dataset: The data input is a dataset. obs: The data input is an OBS path.
remote_constraint	No	Array of remote_constraint objects	Definition: Data input constraint. Constraints: N/A

**Table 27** InputDataInfo
Parameter	Mandatory	Type	Description
dataset	No	dataset object	Definition: The input is a dataset. Constraints: N/A
obs	No	obs object	Definition: OBS in which data input and output are stored. Constraints: N/A

**Table 28** dataset
Parameter	Mandatory	Type	Description
id	Yes	String	Definition: Dataset ID of a training job. Constraints: N/A Range: N/A Default Value: N/A
version_id	Yes	String	Definition: Dataset version ID of a training job. Constraints: N/A Range: N/A Default Value: N/A

**Table 29** obs
Parameter	Mandatory	Type	Description
obs_url	Yes	String	Definition: OBS URL of the dataset for a training job, For example, /usr/data/. Constraints: N/A Range: N/A Default Value: N/A

**Table 30** remote_constraint
Parameter	Mandatory	Type	Description
data_type	No	String	Definition: Data input type, including the data storage location and dataset. Constraints: N/A Range: N/A Default Value: N/A
attributes	No	String	Definition: Related attributes. Constraints: N/A Range: If the input is a dataset: data_format: data format data_segmentation: data segmentation method dataset_type: data labeling type Default Value: N/A

**Table 31** Output
Parameter	Mandatory	Type	Description
name	Yes	String	Definition: Name of the data output channel. Constraints: N/A Range: N/A Default Value: N/A
description	No	String	Definition: Description of the data output channel. Constraints: N/A Range: N/A Default Value: N/A
local_dir	No	String	Definition: Local path of the container to which the data output channels are mapped. Constraints: N/A Range: N/A Default Value: N/A
access_method	No	String	Definition: Access method of the output data channel path (local_dir). Constraints: N/A Range: parameter: hyperparameters env: environment variables Default Value: parameter
remote	Yes	Remote object	Definition: Description of the actual data output. Constraints: N/A

**Table 32** Remote
Parameter	Mandatory	Type	Description
obs	Yes	RemoteObs object	Definition: Data actually output to OBS. Constraints: N/A

**Table 33** RemoteObs
Parameter	Mandatory	Type	Description
obs_url	Yes	String	Definition: Path of the data output to OBS. Constraints: N/A Range: N/A Default Value: N/A

**Table 34** engine
Parameter	Mandatory	Type	Description
engine_id	No	String	Definition: Engine ID selected for an algorithm. Constraints: N/A Range: N/A Default Value: N/A
engine_name	No	String	Definition: Engine name selected for an algorithm. Constraints: If engine_id is specified, leave this parameter blank. Range: N/A Default Value: N/A
engine_version	No	String	Definition: Engine version selected for an algorithm. Constraints: If engine_id is specified, leave this parameter blank. Range: N/A Default Value: N/A
image_url	No	String	Definition: Custom image URL selected for an algorithm. Constraints: N/A Range: N/A Default Value: N/A

**Table 35** engine
Parameter	Mandatory	Type	Description
engine_id	No	String	Definition: ID of the engine flavor of a heterogeneous job, for example, caffe-1.0.0-python2.7. Constraints: N/A Range: N/A Default Value: N/A
engine_name	No	String	Definition: Name of the engine flavor of a heterogeneous job, for example, Caffe. Constraints: N/A Range: N/A Default Value: N/A
engine_version	No	String	Definition: Version of the engine flavor of a heterogeneous job. Constraints: N/A Range: N/A Default Value: N/A
image_url	No	String	Definition: Custom image URL selected for an algorithm. Constraints: N/A Range: N/A Default Value: N/A
run_user	No	String	Definition: Container image startup user. The default value is 1000. This parameter can be configured only when a custom image is used. Constraints: N/A Range: N/A Default Value: N/A

**Table 36** inputs
Parameter	Mandatory	Type	Description
name	Yes	String	Definition: Name of the data input channel. Constraints: N/A Range: N/A Default Value: N/A
description	No	String	Definition: Description of the data input channel. Constraints: N/A Range: N/A Default Value: N/A
local_dir	No	String	Definition: Local path of the container to which the data input channels are mapped. Constraints: N/A Range: N/A Default Value: N/A
remote	Yes	remote object	Definition: Description of the actual data input. Constraints: The options are as follows: dataset: The data input is a dataset. obs: The data input is an OBS path.

**Table 37** remote
Parameter	Mandatory	Type	Description
obs	No	obs object	Definition: OBS in which data input and output are stored. Constraints: N/A

**Table 38** obs
Parameter	Mandatory	Type	Description
obs_url	Yes	String	Definition: OBS URL of the dataset for a training job, For example, /usr/data/. Constraints: N/A Range: N/A Default Value: N/A

**Table 39** outputs
Parameter	Mandatory	Type	Description
name	Yes	String	Definition: Name of the data output channel. Constraints: N/A Range: N/A Default Value: N/A
description	No	String	Definition: Description of the data output channel. Constraints: N/A Range: N/A Default Value: N/A
local_dir	No	String	Definition: Local path of the container to which the data output channels are mapped. Constraints: N/A Range: N/A Default Value: N/A
remote	Yes	remote object	Definition: Description of the actual data output. Constraints: N/A

**Table 40** remote
Parameter	Mandatory	Type	Description
obs	Yes	obs object	Definition: Data actually output to OBS. Constraints: N/A

**Table 41** obs
Parameter	Mandatory	Type	Description
obs_url	Yes	String	Definition: Path of the data output to OBS. Constraints: N/A Range: N/A Default Value: N/A

**Table 42** task_resource
Parameter	Mandatory	Type	Description
flavor_id	No	String	Definition: ID of the resource flavor selected for a training job. Constraints: N/A Range: N/A Default Value: N/A
node_count	Yes	Integer	Definition: Number of resource replicas selected for a training job. Constraints: N/A Range: N/A Default Value: N/A
pool_id	No	String	Definition: ID of the resource pool selected for a training task. Constraints: N/A Range: N/A Default Value: N/A

**Table 43** log_export_path
Parameter	Mandatory	Type	Description
obs_url	No	String	Definition: OBS path for storing training job logs. Constraints: N/A

**Table 44** Spec
Parameter	Mandatory	Type	Description
resource	No	SpecResource object	Definition: Resource flavor of a training job. Constraints: Select either flavor_id or pool_id or flavor_id. If you select a public resource pool, only flavor_id is needed. Select the number of PUs and memory your training job needs. If the public resource pool has enough idle resources, your job will be scheduled. If you select a dedicated resource pool, both pool_id and flavor_id are needed. Select the smallest number of PUs that meet your training needs to save resources and boost efficiency.
volumes	No	Array of SpecVolumes objects	Definition: Mounting volume information of a training job. Constraints: N/A
log_export_path	No	LogExportPath object	Definition: Log output of a training job. Constraints: N/A
auto_stop	No	AutoStop object	Definition: Auto stop configuration of a training job. Constraints: N/A
schedule_policy	No	SchedulePolicy object	Definition: Scheduling policy of a training job. Constraints: N/A
notification	No	Notification object	Definition: Message notification of a training event. Constraints: N/A
custom_metrics	No	Array of CustomMetrics objects	Definition: Metric collection configuration.

**Table 45** SpecResource
Parameter	Mandatory	Type	Description
flavor_id	No	String	Definition: ID of the resource flavor of a training job. Constraints: N/A Range: The flavor_id parameter cannot be specified for a dedicated resource pool of CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows: modelarts.pool.visual.xlarge (1 PU) modelarts.pool.visual.2xlarge (2 PUs) modelarts.pool.visual.4xlarge (4 PUs) modelarts.pool.visual.8xlarge (8 PUs) modelarts.pool.visual.16xlarge (16 cards, only for the Snt9b23 supernode resource pool) Default Value: N/A
node_count	No	Integer	Definition: Number of nodes used to create a training job in a resource pool. Constraints: N/A Range: N/A Default Value: single node
pool_id	No	String	Definition: Dedicated resource pool ID. Constraints: N/A Range: N/A Default Value: N/A
pool_group_id	No	String	Definition: Resource pool federation ID. Constraints: This parameter is mandatory when kind is set to federated_pool_job. Range: N/A Default Value: N/A
main_container_customized_flavor	No	MainContainerCustomizedFlavor object	Definition: Custom flavor. Constraints: N/A Range: The number of CPU cores and memory size must be greater than 0, and the number of accelerator PUs must be greater than or equal to 0. Default Value: N/A

**Table 46** MainContainerCustomizedFlavor
Parameter	Mandatory	Type	Description
cpu_core_num	No	Float	Definition: Number of CPU cores. Range: greater than 0
mem_size	No	Float	Definition: Memory size. Range: greater than 0
accelerator_num	No	Float	Definition: Number of accelerator cards. Range: greater than or equal to 0

**Table 47** SpecVolumes
Parameter	Mandatory	Type	Description
nfs	No	Nfs object	Definition: NFS mounting volume information of a training job. Constraints: N/A
pfs	No	Pfs object	Definition: obsfs mounting volume information of a training job. Constraints: N/A
obs	No	Obs object	Definition: OBS mounting volume information of a training job. Constraints: N/A

**Table 48** Nfs
Parameter	Mandatory	Type	Description
nfs_server_path	No	String	Definition: NFS server path, for example, 10.10.10.10:/example/path. Constraints: N/A Range: N/A Default Value: N/A
local_path	No	String	Definition: Path for attaching volumes to the training container, for example, /example/path. Constraints: N/A Range: N/A Default Value: N/A
read_only	No	Boolean	Definition: Specifies whether the disks attached to the container in NFS mode are read-only. Constraints: N/A Range: true: read only false: non-read-only Default Value: N/A

**Table 49** Pfs
Parameter	Mandatory	Type	Description
pfs_path	No	String	Definition: Address of obsfs. For example, /test-bucket/path. Constraints: N/A Range: N/A Default Value: N/A
local_path	No	String	Definition: Path for attaching volumes to the training container, for example, /example/path. Constraints: N/A Range: N/A Default Value: N/A

**Table 50** Obs
Parameter	Mandatory	Type	Description
obs_path	No	String	Definition: OBS path to be mounted. For example, /test-bucket/path. Constraints: N/A Range: N/A Default Value: N/A
local_path	No	String	Definition: Path for attaching volumes to the training container, for example, /example/path. Constraints: N/A Range: N/A Default Value: N/A

**Table 51** LogExportPath
Parameter	Mandatory	Type	Description
obs_url	No	String	Definition: OBS path for storing training job logs, for example, obs://example/path. Constraints: N/A Range: N/A Default Value: N/A
host_path	No	String	Definition: Path of the host where training job logs are stored, for example, /example/path. Constraints: N/A Range: N/A Default Value: N/A

**Table 52** AutoStop
Parameter	Mandatory	Type	Description
time_unit	Yes	String	Definition: Time unit. Constraints: N/A Range: HOURS: hour Default Value: N/A
duration	Yes	Integer	Definition: Runtime. Constraints: N/A Range: The minimum value is 1. Default Value: N/A

**Table 53** SchedulePolicy
Parameter	Mandatory	Type	Description
required_affinity	No	RequiredAffinity object	Definition: Affinity requirements of a training job. Constraints: N/A
priority	No	Integer	Definition: Priority of a training job. Constraints: The priority can be set for a training job only when a dedicated resource pool is used. The value ranges from 1 to 3. The default priority is 1, and the highest priority is 3. By default, the job priority can be set to 1 or 2. After the permission to set the highest job priority is configured, the priority can be set to 1 to 3. Range: 0 to 3 Default Value: N/A
preemptible	No	Boolean	Definition: Whether the resource can be preempted. Constraints: N/A Range: true: The resource can be preempted. false: The resource cannot be preempted. Default Value: N/A

**Table 54** RequiredAffinity
Parameter	Mandatory	Type	Description
affinity_type	No	String	Definition: Affinity scheduling policy. Constraints: N/A Range: cabinet: strong cabinet scheduling hyperinstance: supernode affinity scheduling Default Value: N/A
affinity_group_size	No	Integer	Definition: Size of an affinity group. Constraints: This parameter is mandatory when affinity_type is set to hyperinstance. In this case, the system schedules tasks specified by affinity_group_size to a supernode to form an affinity group. When a user delivers a training job to the supernode resource pool, if the affinity group size is not set, the system sets the value to 1 by default. Range: N/A Default Value: 1

**Table 55** Notification
Parameter	Mandatory	Type	Description
topic_urn	No	String	Definition: URN of the selected topic in SMN. Constraints: N/A Range: N/A Default Value: N/A
events	No	Array of strings	Definition: Training event that triggers a notification. Constraints: The options are as follows: JobStarted: The job is started. JobCompleted: The job is completed. JobFailed: The job is failed. JobTerminated: The job is terminated. JobRestarted: The job is restarted. JobHanged: The job is suspended. JobPreempted: The job is preempted.

**Table 56** CustomMetrics
Parameter	Mandatory	Type	Description
exec	No	Exec object	Definition: Metrics are collected in CLI mode.
http_get	No	HttpGet object	Definition: Metrics are collected in HTTP mode.

**Table 57** Exec
Parameter	Mandatory	Type	Description
command	No	Array of strings	Definition: Metrics are collected in CLI mode.

**Table 58** HttpGet
Parameter	Mandatory	Type	Description
path	No	String	Definition: URL for obtaining metrics over HTTP. Both the URL and the port below must either be configured together or remain empty. Range: N/A
port	No	Integer	Definition: Port for obtaining metrics over HTTP. This parameter and the URL above must be set or left blank at the same time. Range: N/A

**Table 59** JobEndpointsReq
Parameter	Mandatory	Type	Description
ssh	No	SSHReq object	Definition: SSH connection information. Constraints: N/A

**Table 60** SSHReq
Parameter	Mandatory	Type	Description
key_pair_names	No	Array of strings	Definition: Name of the SSH key pair, which can be created and viewed on the Key Pair page of the Elastic Cloud Server (ECS) console. Constraints: N/A

Response Parameters

Status code: 201

**Table 61** Response body parameters
Parameter	Type	Description
kind	String	Definition: Type of a training job. Range job: common job federated_pool_job: resource pool federated job edge_job: edge job hetero_job: heterogeneous job mrs_job: MRS job autosearch_job: auto search job diag_job: diagnosis job visualization_job: visualization job
metadata	JobMetadataResponse object	Definition: Training job metadata.
status	Status object	Definition: Training job status information.
algorithm	JobAlgorithmResponse object	Definition: Training job algorithm.
tasks	Array of TaskResponse objects	Definition: Heterogeneous training tasks.
spec	SpecResponse object	Definition: Training job specifications.
endpoints	JobEndpointsResp object	Definition: Configurations required for remotely accessing a training job.

**Table 62** JobMetadataResponse
Parameter	Type	Description
id	String	Definition: Training job ID, which is generated and returned by ModelArts after a training job is created. Range: N/A
name	String	Definition: Name of a training job. Range: The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-).
workspace_id	String	Definition: Workspace where a specified job is located. Range: N/A
description	String	Definition: Definition of a training job. Range: N/A
create_time	Long	Definition: Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created. Range: N/A
user_name	String	Definition: Username for creating a training job. The username is generated and returned by ModelArts after a training job is created. Range: N/A
annotations	Map<String,String>	Definition: Advanced functions of a training job.

**Table 63** Status
Parameter	Type	Description
phase	String	Definition: Level-1 status of a training job. Range: Creating: The job is being created. Pending: The job is pending. Running: The job is running. Failed: The job failed to run. Completed: The job is complete. Terminating: The job is being stopped. Terminated: The job has been stopped. Abnormal: The job is abnormal.
secondary_phase	String	Definition: Level-2 status of a training job. The values are internal detailed statuses and may be added, changed, or deleted. Dependency on the status is not recommended. Range: Creating: The job is being created. Queuing: The job is queuing. Running: The job is running. Failed: The job failed to run. Completed: The job is complete. Terminating: The job is being stopped. Terminated: The job has been stopped. CreateFailed: The job fails to be created. TerminatedFailed: The job fails to be stopped. Unknown: The job is in an unknown state. Lost: The job is abnormal.
duration	Long	Definition: Running duration of a training job, in ms. Range: N/A
node_count_metrics	Array<Array<Integer>>	Definition: Node quantity change metric during a training job runtime.
tasks	Array of strings	Definition: Training job subtask name.
start_time	Long	Definition: Timestamp when a training job is started. Range: N/A
task_statuses	Array of TaskStatuses objects	Definition: Status of the first failed subtask of a training job.
running_records	Array of RunningRecord objects	Definition: Running and fault recovery records of a training job.

**Table 64** TaskStatuses
Parameter	Type	Description
task	String	Definition: Training job subtask name. Range: N/A
exit_code	Integer	Definition: Exit code of a training job subtask. Range: N/A
message	String	Definition: Error message of a training job subtask. Range: N/A

**Table 65** RunningRecord
Parameter	Type	Description
start_at	Integer	Definition: Unix timestamp of the start time in the current running record, in seconds. Range: N/A
end_at	Integer	Definition: Unix timestamp of the end time in the current running record, in seconds. Range: N/A
xpu_start_at	Integer	Definition: Unix timestamp of the accelerator card startup time in the current running record, in seconds. Range: N/A
start_type	String	Definition: Startup mode of the current execution. Range init_or_rescheduled: This startup is the first running after scheduling, including the first startup and the running after scheduling recovery. restarted: This startup is not the first running after scheduling but the running after a process restart.
end_reason	String	Definition: Reason why the running ends. Range: N/A
end_related_task	String	Definition: ID of the task worker (for example, worker-0) that ends the running. Range: N/A
end_recover	String	Definition: Fault tolerance policy adopted when the execution ends abnormally. Range npu_proc_restart: NPU in-place hot recovery proc_restart: in-place process recovery npu_step_retry: step recomputation pod_reschedule: pod-level rescheduling job_reschedule: job-level rescheduling job_reschedule_with_taint: isolated job-level rescheduling
end_recover_before_downgrade	String	Definition: There is a downgrade relationship between policies. If a policy fails to be executed, it will be downgraded to another specified policy. end_recover_before_downgrade indicates the tolerance policy used before end_recover is downgraded. Range: same as that of end_recover.
recover_records	Array of RecoverRecord objects	Definition: details about all fault tolerance policies adopted when the execution ends abnormally.

**Table 66** RecoverRecord
Parameter	Type	Description
recover_start_at	Integer	Unix timestamp of the start time of the fault tolerance policy, in seconds. The timestamp is also the fault occurrence time.
recover_end_at	Integer	Unix timestamp of the end time of the fault tolerance policy, in seconds.
recover	String	Fault tolerance policy. Options: npu_step_retry: step recomputation npu_proc_restart: NPU in-place hot recovery proc_restart: in-place process recovery pod_reschedule: pod-level rescheduling job_reschedule: job-level rescheduling job_reschedule_with_taint: isolated job-level rescheduling
fault_scenario	String	Fault scenario. Options: chip_fault: chip fault node_fault: node fault job_failed: job exit upon a failure job_hanged: job suspension job_subhealth: job subhealth error_in_log: log exception
reason	String	Cause of the fault.
related_task	String	ID of the task worker that causes the end of the current running record, for example, worker-0.
recover_result	String	Execution result of the fault. Options: recovering: executing success: successful failed: failed downgrade: policy downgrade

**Table 67** JobAlgorithmResponse
Parameter	Type	Description
id	String	Definition: Training job algorithm. Range: id: Only the algorithm ID is used. subscription_id+item_version_id: The subscription ID and version ID of the algorithm are used. code_dir+boot_file: The code directory and boot file of the training job are used.
name	String	Definition: Algorithm name. Range: N/A
subscription_id	String	Definition: Subscription ID of a subscription algorithm, which must be used with item_version_id. Range: N/A
item_version_id	String	Definition: Version of a subscription algorithm, which must be used with subscription_id. Range: N/A
code_dir	String	Definition: Code directory of a training job, for example, /usr/app/. This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified. Range: N/A
boot_file	String	Definition: Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified. Range: N/A
autosearch_config_path	String	Definition: YAML configuration path of an auto search job. An OBS URL is required. For example, obs://bucket/file.yaml. Range: N/A
autosearch_framework_path	String	Definition: Framework code directory of an auto search job. An OBS URL is required. For example, obs://bucket/files/. Range: N/A
command	String	Definition: Boot command for starting the container of a custom image for a training job. For example, python train.py. Range: N/A
parameters	Array of ParameterResp objects	Definition: Running parameters of the training job.
policies	policies object	Definition: Policy supported by a job.
inputs	Array of InputResp objects	Definition: Data input of a training job.
outputs	Array of OutputResp objects	Definition: Output of the training job.
engine	JobEngineResp object	Definition: Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm.
local_code_dir	String	Definition: Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows: The directory must be under /home. In v1 compatibility mode, the current field does not take effect. When code_dir is prefixed with file://, the current field does not take effect. Range: N/A
working_dir	String	Definition: Work directory where an algorithm is executed. Rules: In v1 compatibility mode, this parameter does not take effect. Range: N/A
environments	Array of Map<String,String> objects	Definition: Environment variables of a training job. The format is key:value. Leave this parameter blank.
summary	SummaryResp object	Definition: Visualization log summary.

**Table 68** ParameterResp
Parameter	Type	Description
name	String	Definition: Parameter name. Range: N/A
value	String	Definition: Parameter value. Range: N/A
description	String	Definition: Parameter description. Range: N/A
constraint	constraint object	Definition: Parameter attribute.
i18n_description	i18n_description object	Definition: Internationalization description.

**Table 69** constraint
Parameter	Type	Description
type	String	Definition: Parameter type. Range: N/A
editable	Boolean	Definition: Whether the parameter can be edited. Range: true: editable false: Not uneditable
required	Boolean	Definition: Whether the parameter is mandatory. Range: true: mandatory false: optional
sensitive	Boolean	Definition: Whether the parameter is sensitive. This function is unavailable currently. Range: true: sensitive false: insensitive
valid_type	String	Definition: Valid type. Range: N/A
valid_range	Array of strings	Definition: Valid range.

**Table 70** i18n_description
Parameter	Type	Description
language	String	Definition: Internationalization language. The options are as follows: zh-cn: Chinese en-us: English](tag:hc,hk) Range: N/A
description	String	Definition: Internationalization language description. Range: N/A

**Table 71** policies
Parameter	Type	Description
auto_search	auto_search object	Definition: Hyperparameter search configuration.

**Table 72** auto_search
Parameter	Type	Description
skip_search_params	String	Definition: Hyperparameter parameters that need to be skipped. Range: N/A
reward_attrs	Array of reward_attrs objects	Definition: Search metrics.
search_params	Array of search_params objects	Definition: Search parameters.
algo_configs	Array of algo_configs objects	Definition: Search algorithm configurations.

**Table 73** reward_attrs
Parameter	Type	Description
name	String	Definition: Metric name. Range: N/A
mode	String	Definition: Search mode. Range: max: A larger metric value is preferred. min: A smaller metric value is preferred.
regex	String	Definition: Regular expression of a metric. Range: N/A

**Table 74** search_params
Parameter	Type	Description
name	String	Definition: Hyperparameter name. Range: N/A
param_type	String	Definition: Parameter type. Range: continuous: The hyperparameter is of the continuous type. When an algorithm is used in a training job, continuous hyperparameters are displayed as text boxes on the console. discrete: The hyperparameter is of the discrete type. When an algorithm is used in a training job, discrete hyperparameters are displayed as drop-down lists on the console.
lower_bound	String	Definition: Lower bound of the hyperparameter. Range: N/A
upper_bound	String	Definition: Upper bound of the hyperparameter. Range: N/A
discrete_points_num	String	Definition: Number of discrete points of a hyperparameter with continuous values. Range: N/A
discrete_values	Array of strings	Definition: Discrete hyperparameter values.

**Table 75** algo_configs
Parameter	Type	Description
name	String	Definition: Search algorithm name. Range: N/A
params	Array of AutoSearchAlgoConfigParameterResp objects	Definition: Search algorithm parameters.

**Table 76** AutoSearchAlgoConfigParameterResp
Parameter	Type	Description
key	String	Definition: Parameter key. Range: N/A
value	String	Definition: Parameter value. Range: N/A
type	String	Definition: Parameter type. Range: N/A

**Table 77** InputResp
Parameter	Type	Description
name	String	Definition: Name of the data input channel. Range: N/A
description	String	Definition: Description of the data input channel. Range: N/A
local_dir	String	Definition: Local path of the container to which the data input channels are mapped. Example: /home/ma-user/modelarts/inputs/data_url_0 Range: N/A
access_method	String	Definition: Access method of the input data channel path (local_dir). Range: parameter: hyperparameters env: environment variables
remote	InputDataInfoResp object	Definition: Description of the actual data input.
remote_constraint	Array of remote_constraint objects	Definition: Data input constraint.

**Table 78** InputDataInfoResp
Parameter	Type	Description
dataset	dataset object	Definition: The input is a dataset.
obs	obs object	Definition: OBS in which data input and output are stored.

**Table 79** dataset
Parameter	Type	Description
id	String	Definition: Dataset ID of a training job. Range: N/A
version_id	String	Definition: Dataset version ID of a training job. Range: N/A
obs_url	String	Definition: OBS URL of the dataset for a training job. It is automatically parsed by ModelArts based on the dataset ID and dataset version ID. For example, /usr/data/. Range: N/A

**Table 80** obs
Parameter	Type	Description
obs_url	String	Definition: OBS URL of the dataset for a training job, For example, /usr/data/. Range: N/A

**Table 81** remote_constraint
Parameter	Type	Description
data_type	String	Definition: Data input type, including the data storage location and dataset. Constraints: N/A Range: N/A Default Value: N/A
attributes	String	Definition: Related attributes. Constraints: N/A Range: If the input is a dataset: data_format: data format data_segmentation: data segmentation method dataset_type: data labeling type Default Value: N/A

**Table 82** OutputResp
Parameter	Type	Description
name	String	Definition: Name of the data output channel. Range: N/A
description	String	Definition: Description of the data output channel. Range: N/A
local_dir	String	Definition: Local path of the container to which the data output channels are mapped. Range: N/A
access_method	String	Definition: Access method of the input data channel path (local_dir). Range: parameter: hyperparameters env: environment variables
remote	RemoteResp object	Definition: Description of the actual data output.

**Table 83** JobEngineResp
Parameter	Type	Description
engine_id	String	Definition: Engine ID selected for a training job. Range: N/A
engine_name	String	Definition: Engine name selected for a training job. Range: N/A
engine_version	String	Definition: Engine version selected for a training job. Range: N/A
image_url	String	Definition: Custom image URL selected for a training job. The URL is obtained from SWR. Range: N/A
install_sys_packages	Boolean	Definition: Specifies whether to install the MoXing version specified by the training platform. Range: true: yes false: no

**Table 84** SummaryResp
Parameter	Type	Description
log_type	String	Definition: Visualization log type of a training job. After this parameter is configured, the training job can be used as the data source of a visualization job. Range: tensorboard: TensorBoard mindstudio-insight: MindStudio Insight
log_dir	LogDirResp object	Definition: Visualization log output of a training job.
data_sources	Array of DataSourceResp objects	Definition: Visualization log input of the visualization job or training job debugging mode.

**Table 85** LogDirResp
Parameter	Type	Description
pfs	PFSSummaryResp object	Definition: Output of an OBS parallel file system.

**Table 86** PFSSummaryResp
Parameter	Type	Description
pfs_path	String	Definition: URL of the OBS parallel file system. Range: N/A

**Table 87** DataSourceResp
Parameter	Type	Description
job	JobSummaryResp object	Definition: Job data source.

**Table 88** JobSummaryResp
Parameter	Type	Description
job_id	String	Definition: ID of a training job. Range: N/A

**Table 89** TaskResponse
Parameter	Type	Description
role	String	Definition: Task role. This function is not supported currently. Range: N/A
algorithm	TaskResponseAlgorithm object	Definition: Algorithm configurations for algorithm management.
task_resource	FlavorResponse object	Definition: Specifications of a training job or algorithm.
log_export_path	log_export_path object	Definition: Saved information about training job logs.

**Table 90** TaskResponseAlgorithm
Parameter	Type	Description
code_dir	String	Definition: Absolute path of the directory where the algorithm boot file is stored. Range: N/A
boot_file	String	Definition: Absolute path of an algorithm boot file. Range: N/A
inputs	AlgorithmInput object	Definition: Information about the algorithm input channel.
outputs	AlgorithmOutput object	Definition: Information about the algorithm output channel.
engine	AlgorithmEngine object	Definition: Engine that a heterogeneous job depends on.
local_code_dir	String	Definition: Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows: The directory must be under /home. In v1 compatibility mode, the current field does not take effect. When code_dir is prefixed with file://, the current field does not take effect. Range: N/A
working_dir	String	Definition: Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode. Range: N/A
environments	Map<String,String>	Definition: Environment variables related to a training job. Range: N/A

**Table 91** AlgorithmInput
Parameter	Type	Description
name	String	Definition: Name of the data input channel. Range: N/A
local_dir	String	Definition: Local path of the container to which the data input and output channels are mapped. Range: N/A
remote	AlgorithmRemote object	Definition: Actual data input, which can only be OBS for heterogeneous jobs.

**Table 92** AlgorithmRemote
Parameter	Type	Description
obs	RemoteObsResp object	Definition: OBS in which data input and output are stored.

**Table 93** AlgorithmOutput
Parameter	Type	Description
name	String	Definition: Name of the data output channel. Range: N/A
local_dir	String	Definition: Local path of the container to which the data output channels are mapped. Range: N/A
remote	RemoteResp object	Definition: Description of the actual data output.
mode	String	Definition: Data transmission mode. The default value is upload_periodically. Range: N/A
period	String	Definition: Data transmission period. The default value is 30s. Range: N/A

**Table 94** RemoteResp
Parameter	Type	Description
obs	RemoteObsResp object	Definition: Data actually output to OBS.

**Table 95** RemoteObsResp
Parameter	Type	Description
obs_url	String	Definition: Path of the data output to OBS. Range: N/A

**Table 96** AlgorithmEngine
Parameter	Type	Description
engine_id	String	Definition: Engine flavor ID, for example, caffe-1.0.0-python2.7. Range: N/A
engine_name	String	Definition: Engine flavor name, for example, Caffe. Range: N/A
engine_version	String	Definition: Engine flavor version. Engines with the same name have multiple versions, for example, Caffe-1.0.0-python2.7 of Python 2.7. Range: N/A
v1_compatible	Boolean	Definition: Specifies whether the v1 compatibility mode is used. Range: true: The v1 compatibility mode is used. false: The v1 compatibility mode is not used.
run_user	String	Definition: Default UID for the engine startup. Range: N/A
image_url	String	Definition: Custom image URL selected for an algorithm. Range: N/A

**Table 97** FlavorResponse
Parameter	Type	Description
pool_id	String	Definition: ID of the resource pool selected for a training job. Range: N/A
flavor_id	String	Definition: Resource flavor ID. Range: N/A
flavor_name	String	Definition: Resource flavor name. Range: N/A
max_num	Integer	Definition: Maximum number of nodes supported by a flavor. Range: N/A
flavor_type	String	Definition: Resource flavor type. Range: CPU GPU Ascend
billing	BillingInfo object	Definition: Billing information of a resource flavor.
flavor_info	FlavorInfoResponse object	Definition: Resource flavor details.
attributes	Map<String,String>	Definition: Other flavor attributes. Range: N/A

**Table 98** FlavorInfoResponse
Parameter	Type	Description
max_num	Integer	Definition: Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported. Range: N/A
cpu	Cpu object	Definition: CPU specifications.
gpu	Gpu object	Definition: GPU specifications.
npu	Npu object	Definition: Ascend specifications.
memory	Memory object	Definition: Memory information.
disk	DiskResponse object	Definition: Disk information.

**Table 99** DiskResponse
Parameter	Type	Description
size	Integer	Definition: Disk size. Range: N/A
unit	String	Definition: Unit of the disk size. Range: N/A

**Table 100** log_export_path
Parameter	Type	Description
obs_url	String	Definition: OBS path for storing training job logs.

**Table 101** SpecResponse
Parameter	Type	Description
resource	Resource object	Definition: Resource flavor of a training job. Select either flavor_id or pool_id and flavor_id.
volumes	Array of JobVolumeResp objects	Definition: Mounting volume information of a training job.
log_export_path	LogExportPathResp object	Definition: Log output of a training job.
schedule_policy	SchedulePolicyResp object	Definition: Scheduling policy of a training job.
custom_metrics	Array of CustomMetrics objects	Definition: Metric collection configuration.

**Table 102** Resource
Parameter	Type	Description
policy	String	Definition: Resource flavor mode of a training job. Range: regular: standard mode
flavor_id	String	Definition: ID of the resource flavor of a training job. Range: The flavor_id parameter cannot be specified for a dedicated resource pool of CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows: modelarts.pool.visual.xlarge (1 PU) modelarts.pool.visual.2xlarge (2 PUs) modelarts.pool.visual.4xlarge (4 PUs) modelarts.pool.visual.8xlarge (8 PUs)
flavor_name	String	Definition: Read-only flavor name returned by ModelArts when flavor_id is used. Range: N/A
node_count	Integer	Definition: Number of resource replicas selected for a training job. Range: N/A
pool_id	String	Definition: ID of the resource pool selected for a training job. Range: N/A
pool_group_id	String	Definition: ID of the resource pool federation selected for a training job. Range: N/A
flavor_detail	FlavorDetail object	Definition: Flavor details of a training job or algorithm. This parameter is available only for public resource pools.
main_container_allocated_resources	MainContainerAllocatedResources object	Definition: Resource specifications actually obtained by the training container of a training job.
main_container_customized_flavor	MainContainerCustomizedFlavor object	Definition: Custom flavor of a training job. Range: The number of CPU cores and memory size must be greater than 0, and the number of accelerator PUs must be greater than or equal to 0.

**Table 103** FlavorDetail
Parameter	Type	Description
flavor_type	String	Definition: Resource flavor type. Range: CPU GPU Ascend
billing	BillingInfo object	Definition: Billing information of a resource flavor.
flavor_info	FlavorInfo object	Definition: Resource flavor details.

**Table 104** BillingInfo
Parameter	Type	Description
code	String	Definition: Billing code. Range: N/A
unit_num	Integer	Definition: Billing unit. Range: N/A

**Table 105** FlavorInfo
Parameter	Type	Description
max_num	Integer	Definition: Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported. Range: N/A
cpu	Cpu object	Definition: CPU specifications.
gpu	Gpu object	Definition: GPU specifications.
npu	Npu object	Definition: Ascend specifications.
memory	Memory object	Definition: Memory information.
disk	Disk object	Definition: Disk information.

**Table 106** Cpu
Parameter	Type	Description
arch	String	Definition: CPU architecture. Range: N/A
core_num	Integer	Definition: Number of cores. Range: N/A

**Table 107** Gpu
Parameter	Type	Description
unit_num	Integer	Definition: Number of GPUs. Range: N/A
product_name	String	Definition: Product name. Range: N/A
memory	String	Definition: Memory. Range: N/A

**Table 108** Npu
Parameter	Type	Description
unit_num	String	Definition: Number of NPUs. Range: N/A
product_name	String	Definition: Product name. Range: N/A
memory	String	Definition: Memory. Range: N/A

**Table 109** Memory
Parameter	Type	Description
size	Integer	Definition: Memory size. Range: N/A
unit	String	Definition: Number of memory units. Range: N/A

**Table 110** Disk
Parameter	Type	Description
size	String	Definition: Disk size. Range: N/A
unit	String	Definition: Unit of the disk size. Generally, the unit is GB. Range: N/A

**Table 111** MainContainerAllocatedResources
Parameter	Type	Description
cpu_arch	String	Definition: CPU architecture. Range: N/A
cpu_core_num	Float	Definition: Number of cores. Range: N/A
mem_size	Float	Definition: Memory information. Range: N/A
accelerator_num	Float	Definition: Number of accelerator cards. Range: N/A
accelerator_type	String	Definition: Type of accelerator cards. Range: N/A

**Table 112** MainContainerCustomizedFlavor
Parameter	Type	Description
cpu_core_num	Float	Definition: Number of CPU cores. Range: greater than 0
mem_size	Float	Definition: Memory size. Range: greater than 0
accelerator_num	Float	Definition: Number of accelerator cards. Range: greater than or equal to 0

**Table 113** JobVolumeResp
Parameter	Type	Description
nfs	NfsResp object	Definition: Volumes attached in NFS mode.

**Table 114** NfsResp
Parameter	Type	Description
nfs_server_path	String	Definition: NFS server path, for example, 10.10.10.10:/example/path. Range: N/A
local_path	String	Definition: Path for attaching volumes to the training container, for example, /example/path. Range: N/A
read_only	Boolean	Definition: Specifies whether the disks attached to the container in NFS mode are read-only. Range: true: read only false: non-read-only

**Table 115** LogExportPathResp
Parameter	Type	Description
obs_url	String	Definition: OBS path for storing training job logs, for example, obs://example/path. Range: N/A
host_path	String	Definition: Path of the host where training job logs are stored, for example, /example/path. Range: N/A

**Table 116** SchedulePolicyResp
Parameter	Type	Description
required_affinity	RequiredAffinityResp object	Definition: Affinity requirements of a training job.
priority	Integer	Definition: Priority of a training job. Range: 0 to 3
preemptible	Boolean	Definition: Whether the resource can be preempted. Range: true: The resource can be preempted. false: The resource cannot be preempted.

**Table 117** RequiredAffinityResp
Parameter	Type	Description
affinity_type	String	Definition: Affinity scheduling policy. Range: cabinet: strong cabinet scheduling hyperinstance: supernode affinity scheduling
affinity_group_size	Integer	Definition: Size of an affinity group. Range: N/A

**Table 118** CustomMetrics
Parameter	Type	Description
exec	Exec object	Definition: Metrics are collected in CLI mode.
http_get	HttpGet object	Definition: Metrics are collected in HTTP mode.

**Table 119** Exec
Parameter	Type	Description
command	Array of strings	Definition: Metrics are collected in CLI mode.

**Table 120** HttpGet
Parameter	Type	Description
path	String	Definition: URL for obtaining metrics over HTTP. Both the URL and the port below must either be configured together or remain empty. Range: N/A
port	Integer	Definition: Port for obtaining metrics over HTTP. This parameter and the URL above must be set or left blank at the same time. Range: N/A

**Table 121** JobEndpointsResp
Parameter	Type	Description
ssh	SSHResp object	Definition: SSH connection information.
jupyter_lab	JupyterLab object	Definition: JupyterLab connection information.
tensorboard	Tensorboard object	Definition: TensorBoard connection information.
mindstudio_insight	MindStudioInsight object	Definition: MindStudio Insight connection information.

**Table 122** SSHResp
Parameter	Type	Description
key_pair_names	Array of strings	Definition: Name of the SSH key pair, which can be created and viewed on the Key Pair page of the Elastic Cloud Server (ECS) console. Range: N/A
task_urls	Array of TaskUrls objects	Definition: SSH connection address.

**Table 123** TaskUrls
Parameter	Type	Description
task	String	Definition: Task ID of a training job. Range: N/A
url	String	Definition: SSH connection address of a training job. Range: N/A

**Table 124** JupyterLab
Parameter	Type	Description
url	String	Definition: JupyterLab address of a training job. Range: N/A
token	String	Definition: JupyterLab token of a training job. Range: N/A

**Table 125** Tensorboard
Parameter	Type	Description
url	String	Definition: TensorBoard address of a training job. Range: N/A
token	String	Definition: TensorBoard token of a training job. Range: N/A

**Table 126** MindStudioInsight
Parameter	Type	Description
url	String	Definition: MindStudio Insight address of a training job. Range: N/A
token	String	Definition: MindStudio Insight token of a training job. Range: N/A

Status code: 400

**Table 127** Response body parameters
Parameter	Type	Description
error_msg	String	Error message
error_code	String	Error code
error_solution	String	Solution

Example Requests

The following is an example of how to create a training job with free specifications. The job name has been set to TestModelArtsJob and the description has been set to This is a ModelArts job. The required algorithm's ID is 3f5d6706-7b67-408d-8ba0-ec08048c45ed. The inputs and outputs have not been defined for the algorithm.

POST https://endpoint/v2/{project_id}/training-jobs

{
  "kind" : "job",
  "metadata" : {
    "id" : "425b7087-83de-49ed-9e40-5bb642be956f",
    "name" : "TestModelArtsJob",
    "description" : "This is a ModelArts job",
    "create_time" : 1637045545982,
    "workspace_id" : "0",
    "user_name" : ""
  },
  "algorithm" : {
    "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed",
    "name" : "ttt-obs-gpu",
    "code_dir" : "/cn-north-4-rse/test/moxingtest-code/",
    "boot_file" : "/cn-north-4-rse/test/moxingtest-code/test_obs_gpu.py",
    "parameters" : [ {
      "name" : "input_dir",
      "description" : "",
      "i18n_description" : null,
      "value" : "s://cn-north-4-rse/test/moxingtest-dir/",
      "constraint" : {
        "type" : "String",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    }, {
      "name" : "input_file",
      "description" : "",
      "i18n_description" : null,
      "value" : "obs://cn-north-4-rse/test/moxingtest/",
      "constraint" : {
        "type" : "String",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    }, {
      "name" : "large_file_method",
      "description" : "",
      "i18n_description" : null,
      "value" : "1",
      "constraint" : {
        "type" : "Integer",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    } ],
    "engine" : {
      "engine_id" : "horovod-cp36-tf-1.16.2",
      "engine_name" : "Horovod",
      "engine_version" : "0.16.2-TF-1.13.1-python3.6"
    },
    "policies" : { }
  },
  "spec" : {
    "resource" : {
      "flavor_id" : "modelarts.p3.large.public.free",
      "node_count" : 1
    },
    "log_export_path" : { },
    "custom_metrics" : [ {
      "http_get" : {
        "path" : "/raw_text",
        "port" : 10001
      }
    } ]
  }
}

The following is an example of how to use a custom image to create a training job whose name is TestModelArtsJob2 and description is This is a ModelArts job2. A dedicated resource pool and NFS mounting are used.

POST https://endpoint/v2/{project_id}/training-jobs

{
  "kind" : "job",
  "metadata" : {
    "name" : "TestModelArtsJob2",
    "description" : "This is a ModelArts job2"
  },
  "algorithm" : {
    "engine" : {
      "image_url" : "xxxxxxxx/fastseq:1.2"
    },
    "command" : "cd /home/ma-user/ddp_demo && sh run_ddp.sh",
    "parameters" : [ ],
    "policies" : {
      "auto_search" : null
    },
    "environments" : {
      "NCCL_DEBUG" : "INFO",
      "NCCL_IB_DISABLE" : "0"
    }
  },
  "spec" : {
    "resource" : {
      "flavor_id" : "modelarts.pool.visual.xlarge",
      "node_count" : 1,
      "pool_id" : "poolfaf38d76"
    },
    "log_export_path" : {
      "obs_url" : "/cn-north-4-training-test/limou/ddp-demo-log/"
    },
    "volumes" : [ {
      "nfs" : {
        "nfs_server_path" : "192.168.0.82:/",
        "local_path" : "/home/ma-user/nfs/",
        "read_only" : false
      }
    } ]
  }
}

Example Responses

Status code: 201

{
  "kind" : "job",
  "metadata" : {
    "id" : "425b7087-83de-49ed-9e40-5bb642be956f",
    "name" : "TestModelArtsJob",
    "description" : "This is a ModelArts job",
    "create_time" : 1637045545982,
    "workspace_id" : "0",
    "user_name" : ""
  },
  "status" : {
    "phase" : "Creating",
    "secondary_phase" : "Creating",
    "duration" : 0,
    "start_time" : 0,
    "node_count_metrics" : null,
    "tasks" : [ "worker-0", "server-0" ]
  },
  "algorithm" : {
    "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed",
    "name" : "ttt-obs-gpu",
    "code_dir" : "/cn-north-4-rse/test/moxingtest-code/",
    "boot_file" : "/cn-north-4-rse/test/moxingtest-code/test_obs_gpu.py",
    "parameters" : [ {
      "name" : "input_dir",
      "description" : "",
      "i18n_description" : null,
      "value" : "s://cn-north-4-rse/test/moxingtest-dir/",
      "constraint" : {
        "type" : "String",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    }, {
      "name" : "input_file",
      "description" : "",
      "i18n_description" : null,
      "value" : "obs://cn-north-4-rse/test/moxingtest/",
      "constraint" : {
        "type" : "String",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    }, {
      "name" : "large_file_method",
      "description" : "",
      "i18n_description" : null,
      "value" : "1",
      "constraint" : {
        "type" : "Integer",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    } ],
    "engine" : {
      "engine_id" : "horovod-cp36-tf-1.16.2",
      "engine_name" : "Horovod",
      "engine_version" : "0.16.2-TF-1.13.1-python3.6"
    },
    "policies" : { }
  },
  "spec" : {
    "resource" : {
      "policy" : "regular",
      "flavor_id" : "modelarts.p3.large.public.free",
      "flavor_name" : "Computing GPU(Vnt1) instance",
      "node_count" : 1,
      "flavor_detail" : {
        "flavor_type" : "GPU",
        "billing" : {
          "code" : "modelarts.vm.gpu.free",
          "unit_num" : 1
        },
        "flavor_info" : {
          "cpu" : {
            "arch" : "x86",
            "core_num" : 8
          },
          "gpu" : {
            "unit_num" : 1,
            "product_name" : "GP-Vnt1",
            "memory" : "32GB"
          },
          "memory" : {
            "size" : 64,
            "unit" : "GB"
          }
        }
      },
      "main_container_allocated_resources" : {
        "cpu_arch" : "x86",
        "cpu_core_num" : 5,
        "mem_size" : 44,
        "accelerator_num" : 1,
        "accelerator_type" : "nvidia-v100-pcie32"
      }
    },
    "log_export_path" : { },
    "custom_metrics" : [ {
      "exec" : {
        "command" : [ "cat", "/a/b/c.prom" ]
      }
    }, {
      "http_get" : {
        "path" : "/raw_text",
        "port" : 10001
      }
    } ]
  }
}

Status code: 400

Format of the body for a common error response. The following shows the returned information when an algorithm with ID 3f5d6706-7b67-408d-8ba0-ec08048c45ee is not found.

{
  "error_msg" : "algorithm not found.",
  "error_code" : "ModelArts.2755",
  "error_solution" : "Check whether the training project information in the request is valid."
}

Status Codes

Status Code	Description
201	ok
400	Format of the body for a common error response. The following shows the returned information when an algorithm with ID 3f5d6706-7b67-408d-8ba0-ec08048c45ee is not found.