Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Cloud Data Center
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Domain Name Service
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
DataArts Fabric
Cloud Transformation
Cloud Adoption Framework
Well-Architected Framework
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Blockchain
Blockchain Service
Web3 Node Engine Service
MacroVerse aPaaS
CloudDevice
KooDrive
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance (CCI)
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Meeting
IoT
IoT Device Access
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Industry Video Management Service
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
Huawei Cloud Astro Canvas
Huawei Cloud Astro Zero
CodeArts Governance
Updated on 2025-08-20 GMT+08:00

Creating a Training Job

Function

This API is used to create a training job on ModelArts.

This API applies to the following scenarios: When you need to perform machine learning training based on specific datasets and algorithm models, you can use this API to create and configure a training job. Before using this API, ensure that you have uploaded datasets and model code to ModelArts and have the permission to create training jobs. After a training job is created, the platform starts the training job based on the configured resource specifications. You can monitor the training progress and status by using the job ID. If the dataset or model code does not exist, the resource specifications are incorrectly configured, or you do not have the required permission, the API will return an error message.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/training-jobs

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition: Project ID. For details, see Obtaining a Project ID and Name.

Constraints: The value can contain 1 to 64 characters. Letters, digits, and hyphens (-) are allowed.

Range: N/A

Default Value: N/A

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

kind

Yes

String

Definition: Type of a training job.

Constraints: N/A

Range:

  • job: common job

  • edge_job: edge job

  • hetero_job: heterogeneous job

  • mrs_job: MRS job

  • autosearch_job: auto search job

  • diag_job: diagnosis job

  • visualization_job: visualization job

Default Value: job

metadata

Yes

JobMetadata object

Definition: Training job metadata.

Constraints: N/A

algorithm

No

JobAlgorithm object

Definition: Training job algorithm.

Constraints: The options are as follows.

  • id: Only the algorithm ID is used.

  • subscription_id+item_version_id: The subscription ID and version ID of the algorithm are used.

  • code_dir+boot_file: The code directory and boot file of the training job are used.

tasks

No

Array of Task objects

Definition: Task list. This function is not implemented.

Constraints: N/A

spec

No

Spec object

Definition: Training job specifications. If this parameter is specified, leave the tasks parameter blank.

Constraints: N/A

endpoints

No

JobEndpointsReq object

Definition: Configurations required for remotely accessing a training job.

Constraints: N/A

Table 3 JobMetadata

Parameter

Mandatory

Type

Description

name

Yes

String

Definition: Name of a training job.

Constraints: N/A

Range: The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-).

Default Value: N/A

workspace_id

No

String

Definition: Workspace where a specified job is located.

Constraints: N/A

Range: N/A

Default Value: 0

description

No

String

Definition: Definition of a training job.

Constraints: The value must contain 0 to 256 characters.

Range: N/A

Default Value: NULL

annotations

No

Map<String,String>

Definition: Advanced functions of a training job.

Constraints: The options are as follows.

  • job_template: Template RL (heterogeneous job)

  • fault-tolerance/job-retry-num: 3 (number of retries upon a fault)

  • fault-tolerance/job-unconditional-retry: true (unconditional restart)

  • fault-tolerance/hang-retry: true (restart upon a suspension)

  • jupyter-lab/enable: true (JupyterLab training application)

  • tensorboard/enable: true (TensorBoard training application)

  • mindstudio-insight/enable: true (MindStudio Insight training application)

Table 4 JobAlgorithm

Parameter

Mandatory

Type

Description

id

No

String

Definition: Algorithm ID in algorithm management.

Constraints: N/A

Range: N/A

Default Value: N/A

name

No

String

Definition: Algorithm name. Leave it blank.

Constraints: N/A

Range: N/A

Default Value: N/A

subscription_id

No

String

Definition: Subscription ID of a subscription algorithm.

Constraints: This parameter must be used with item_version_id.

Range: N/A

Default Value: N/A

item_version_id

No

String

Definition: Version of a subscription algorithm.

Constraints: This parameter must be used with subscription_id.

Range: N/A

Default Value: N/A

code_dir

No

String

Definition: Code directory of a training job, for example, /usr/app/.

Constraints: This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

Range: N/A

Default Value: N/A

boot_file

No

String

Definition: Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py.

Constraints: This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

Range: N/A

Default Value: N/A

autosearch_config_path

No

String

Definition: YAML configuration path of an auto search job. An OBS URL is required.

Constraints: N/A

Range: N/A

Default Value: N/A

autosearch_framework_path

No

String

Definition: Framework code directory of an auto search job. An OBS URL is required.

Constraints: N/A

Range: N/A

Default Value: N/A

command

No

String

Definition: Command for starting the custom image container of a training job.

Constraints: N/A

Range: N/A

Default Value: N/A

parameters

No

Array of Parameters objects

Definition: Running parameters of the training job.

Constraints: N/A

policies

No

JobPolicies object

Definition: Policies supported by jobs, which are used for hyperparameter search.

Constraints: N/A

inputs

No

Array of Input objects

Definition: Data input of a training job.

Constraints: N/A

outputs

No

Array of Output objects

Definition: Output of the training job.

Constraints: N/A

engine

No

JobEngine object

Definition: Engine of a training job.

Constraints: Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm.

local_code_dir

No

String

Definition: Local directory of the training container to which the algorithm code directory is downloaded.

Constraints:

  • The directory must be under /home.

  • In v1 compatibility mode, the current field does not take effect.

  • When code_dir is prefixed with file://, the current field does not take effect.

Range: N/A

Default Value: N/A

working_dir

No

String

Definition: Work directory where an algorithm is executed.

Constraints: In v1 compatibility mode, the current field does not take effect.

Range: N/A

Default Value: N/A

environments

No

Map<String,String>

Definition: Environment variables of a training job. Format: "key":"value"

Constraints: The key can contain a maximum of 8,192 characters, and the value can contain a maximum of 4,096 characters. A maximum of 100 key-value pairs are allowed. The variable name can contain only letters, digits, and underscores (), and must start with a letter or underscore ().

Note: Variables cannot contain $.

summary

No

Summary object

Definition: Visualization log summary.

Constraints: N/A

Table 5 Parameters

Parameter

Mandatory

Type

Description

name

No

String

Definition: Parameter name.

Constraints: N/A

Range: N/A

Default Value: N/A

value

No

String

Definition: Parameter value.

Constraints: N/A

Range: N/A

Default Value: N/A

description

No

String

Definition: Parameter description.

Constraints: N/A

Range: N/A

Default Value: N/A

constraint

No

ParametersConstraint object

Definition: Parameter attribute.

Constraints: N/A

i18n_description

No

I18nDescription object

Definition: Internationalization description.

Constraints: N/A

Table 6 ParametersConstraint

Parameter

Mandatory

Type

Description

type

No

String

Definition: Parameter type.

Constraints: N/A

Range: N/A

Default Value: N/A

editable

No

Boolean

Definition: Whether the parameter can be edited.

Constraints: N/A

Range:

  • true: editable

  • false: Not uneditable

Default Value: N/A

required

No

Boolean

Definition: Whether the parameter is mandatory.

Constraints: N/A

Range:

  • true: mandatory

  • false: optional

Default Value: N/A

sensitive

No

Boolean

Definition: Whether the parameter is sensitive. This function is unavailable currently.

Constraints: N/A

Range:

  • true: sensitive

  • false: insensitive

Default Value: N/A

valid_type

No

String

Definition: Valid type.

Constraints: N/A

Range: N/A

Default Value: N/A

valid_range

No

Array of strings

Definition: Valid range.

Constraints: N/A

Table 7 I18nDescription

Parameter

Mandatory

Type

Description

language

No

String

Definition: Internationalization language.

Constraints: N/A

Range: N/A

Default Value: N/A

description

No

String

Definition: Description.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 8 JobPolicies

Parameter

Mandatory

Type

Description

auto_search

No

AutoSearch object

Definition: Hyperparameter search configuration.

Constraints: N/A

Table 9 AutoSearch

Parameter

Mandatory

Type

Description

skip_search_params

No

String

Definition: Hyperparameter parameters that need to be skipped.

Constraints: N/A

Range: N/A

Default Value: N/A

reward_attrs

No

Array of RewardAttrs objects

Definition: Search metrics.

Constraints: N/A

search_params

No

Array of SearchParams objects

Definition: Search parameters.

Constraints: N/A

algo_configs

No

Array of AlgoConfigs objects

Definition: Search algorithm configurations.

Constraints: N/A

Table 10 RewardAttrs

Parameter

Mandatory

Type

Description

name

No

String

Definition: Metric name.

Constraints: N/A

Range: N/A

Default Value: N/A

mode

No

String

Definition: Search mode.

Constraints: N/A

Range:

  • max: A larger metric value is preferred.

  • min: A smaller metric value is preferred.

Default Value: N/A

regex

No

String

Definition: Regular expression of a metric.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 11 SearchParams

Parameter

Mandatory

Type

Description

name

No

String

Definition: Hyperparameter name.

Constraints: N/A

Range: N/A

Default Value: N/A

param_type

No

String

Definition: Parameter type.

Constraints: N/A

Range:

  • continuous: The hyperparameter is of the continuous type. When an algorithm is used in a training job, continuous hyperparameters are displayed as text boxes on the console.

  • discrete: The hyperparameter is of the discrete type. When an algorithm is used in a training job, discrete hyperparameters are displayed as drop-down lists on the console.

Default Value: N/A

lower_bound

No

String

Definition: Lower bound of the hyperparameter.

Constraints: N/A

Range: N/A

Default Value: N/A

upper_bound

No

String

Definition: Upper bound of the hyperparameter.

Constraints: N/A

Range: N/A

Default Value: N/A

discrete_points_num

No

String

Definition: Number of discrete points of a hyperparameter with continuous values.

Constraints: N/A

Range: N/A

Default Value: N/A

discrete_values

No

Array of strings

Definition: Discrete hyperparameter values.

Constraints: N/A

Table 12 AlgoConfigs

Parameter

Mandatory

Type

Description

name

No

String

Definition: Search algorithm name.

Constraints: N/A

Range: N/A

Default Value: N/A

params

No

Array of AutoSearchAlgoConfigParameter objects

Definition: Search algorithm parameters.

Constraints: N/A

Table 13 AutoSearchAlgoConfigParameter

Parameter

Mandatory

Type

Description

key

No

String

Definition: Parameter key.

Constraints: N/A

Range: N/A

Default Value: N/A

value

No

String

Definition: Parameter value.

Constraints: N/A

Range: N/A

Default Value: N/A

type

No

String

Definition: parameter type.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 14 JobEngine

Parameter

Mandatory

Type

Description

engine_id

No

String

Definition: Engine ID selected for a training job.

Constraints: The value can be engine_id, engine_name + engine_version, or image_url.

Range: N/A

Default Value: N/A

engine_name

No

String

Definition: Engine name selected for a training job.

Constraints: If engine_id has been set, you do not need to set this parameter. If you use a preset framework and custom image to create a training job, you must set both this parameter and image_url.

Range: N/A

Default Value: N/A

engine_version

No

String

Definition: Engine version selected for a training job.

Constraints: If engine_id has been set, you do not need to set this parameter.

Range: N/A

Default Value: N/A

image_url

No

String

Definition: Custom image URL selected for a training job. The URL is obtained from SWR.

Constraints: The format is organization_name/image_name:tag.

Range: N/A

Default Value: N/A

install_sys_packages

No

Boolean

Definition: Specifies whether to install the MoXing version specified by the training platform.

Constraints: This parameter is available only when engine_name, engine_version, and image_url are set.

Range:

  • true: yes

  • false: no

Default Value: N/A

Table 15 Summary

Parameter

Mandatory

Type

Description

log_type

No

String

Definition: Visualization log type of a training job. After this parameter is configured, the training job can be used as the data source of a visualization job.

Constraints: N/A

Range:

  • tensorboard: TensorBoard

  • mindstudio-insight: MindStudio Insight

Default Value: N/A

log_dir

No

LogDir object

Definition: Visualization log output of a training job.

Constraints: This parameter is mandatory when log_type is not left empty.

data_sources

No

Array of DataSource objects

Definition: Visualization log input of the visualization job or training job debugging mode.

Constraints: This parameter is mandatory when the advanced function "tensorboard/enable": "true" or "mindstudio-insight/enable": "true" is enabled for the training job.

Table 16 LogDir

Parameter

Mandatory

Type

Description

pfs

Yes

PFSSummary object

Definition: Output of an OBS parallel file system.

Constraints: N/A

Table 17 PFSSummary

Parameter

Mandatory

Type

Description

pfs_path

Yes

String

Definition: URL of the OBS parallel file system.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 18 DataSource

Parameter

Mandatory

Type

Description

job

Yes

JobSummary object

Definition: Job data source.

Constraints: N/A

Table 19 JobSummary

Parameter

Mandatory

Type

Description

job_id

Yes

String

Definition: ID of a training job.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 20 Task

Parameter

Mandatory

Type

Description

role

No

String

Definition: Task role. This function is not supported currently.

Constraints: N/A

Range: N/A

Default Value: N/A

algorithm

No

algorithm object

Definition: Algorithm configurations for algorithm management.

Constraints: N/A

task_resource

No

task_resource object

Definition: Resource flavor of a training job.

Constraints: N/A

Table 21 algorithm

Parameter

Mandatory

Type

Description

job_config

No

job_config object

Definition: Algorithm configuration, such as the boot file.

Constraints: N/A

code_dir

No

String

Definition: Algorithm code directory, for example, /usr/app/.

Constraints: This parameter must be used with boot_file.

Range: N/A

Default Value: N/A

boot_file

No

String

Definition: Code boot file of the algorithm, which must be stored in the code directory, for example, /usr/app/boot.py.

Constraints: This parameter must be used with code_dir.

Range: N/A

Default Value: N/A

engine

No

engine object

Definition: Algorithm engine of a heterogeneous job.

Constraints: N/A

inputs

No

Array of inputs objects

Definition: Data input of an algorithm.

Constraints: N/A

outputs

No

Array of outputs objects

Definition: Data output of an algorithm.

Constraints: N/A

local_code_dir

No

String

Definition: Local directory of the training container to which the algorithm code directory is downloaded.

Constraints:

  • The directory must be under /home.

  • In v1 compatibility mode, the current field does not take effect.

  • When code_dir is prefixed with file://, the current field does not take effect.

Range: N/A

Default Value: N/A

working_dir

No

String

Definition: Work directory where an algorithm is executed.

Constraints: In v1 compatibility mode, the current field does not take effect.

Range: N/A

Default Value: N/A

Table 22 job_config

Parameter

Mandatory

Type

Description

parameters

No

Array of Parameter objects

Definition: Running parameters of an algorithm.

Constraints: N/A

inputs

No

Array of Input objects

Definition: Data input of an algorithm.

Constraints: N/A

outputs

No

Array of Output objects

Definition: Data output of an algorithm.

Constraints: N/A

engine

No

engine object

Definition: Algorithm engine.

Constraints: N/A

Table 23 Parameter

Parameter

Mandatory

Type

Description

name

No

String

Definition: Parameter name.

Constraints: N/A

Range: N/A

Default Value: N/A

value

No

String

Definition: Parameter value.

Constraints: N/A

Range: N/A

Default Value: N/A

description

No

String

Definition: Parameter description.

Constraints: N/A

Range: N/A

Default Value: N/A

constraint

No

constraint object

Definition: Parameter attribute.

Constraints: N/A

i18n_description

No

i18n_description object

Definition: Internationalization description.

Constraints: N/A

Table 24 constraint

Parameter

Mandatory

Type

Description

type

No

String

Definition: Parameter type.

Constraints: N/A

Range: N/A

Default Value: N/A

editable

No

Boolean

Definition: Whether the parameter can be edited.

Constraints: N/A

Range:

  • true: editable

  • false: Not uneditable

Default Value: N/A

required

No

Boolean

Definition: Whether the parameter is mandatory.

Constraints: N/A

Range:

  • true: mandatory

  • false: optional

Default Value: N/A

sensitive

No

Boolean

Definition: Whether the parameter is sensitive.

Constraints: This function is unavailable currently.

Range:

  • true: sensitive

  • false: insensitive

Default Value: N/A

valid_type

No

String

Definition: Valid type.

Constraints: N/A

Range: N/A

Default Value: N/A

valid_range

No

Array of strings

Definition: Valid range.

Constraints: N/A

Table 25 i18n_description

Parameter

Mandatory

Type

Description

language

No

String

Definition: Internationalization language.

Constraints: N/A

Range: N/A

Default Value: N/A

description

No

String

Definition: Internationalization language description.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 26 Input

Parameter

Mandatory

Type

Description

name

Yes

String

Definition: Name of the data input channel.

Constraints: N/A

Range: N/A

Default Value: N/A

description

No

String

Definition: Description of the data input channel.

Constraints: N/A

Range: N/A

Default Value: N/A

local_dir

No

String

Definition: Local path of the container to which the data input channels are mapped. Example: /home/ma-user/modelarts/inputs/data_url_0

Constraints: N/A

Range: N/A

Default Value: N/A

access_method

No

String

Definition: Access method of the input data channel path (local_dir).

Constraints: N/A

Range:

  • parameter: hyperparameters

  • env: environment variables

Default Value: parameter

remote

Yes

InputDataInfo object

Definition: Description of the actual data input.

Constraints: The options are as follows.

  • dataset: The data input is a dataset.

  • obs: The data input is an OBS path.

remote_constraint

No

Array of remote_constraint objects

Definition: Data input constraint.

Constraints: N/A

Table 27 InputDataInfo

Parameter

Mandatory

Type

Description

dataset

No

dataset object

Definition: The input is a dataset.

Constraints: N/A

obs

No

obs object

Definition: OBS in which data input and output are stored.

Constraints: N/A

Table 28 dataset

Parameter

Mandatory

Type

Description

id

Yes

String

Definition: Dataset ID of a training job.

Constraints: N/A

Range: N/A

Default Value: N/A

version_id

Yes

String

Definition: Dataset version ID of a training job.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 29 obs

Parameter

Mandatory

Type

Description

obs_url

Yes

String

Definition: OBS URL of the dataset for a training job, For example, /usr/data/.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 30 remote_constraint

Parameter

Mandatory

Type

Description

data_type

No

String

Definition: Data input type, including the data storage location and dataset.

Constraints: N/A

Range: N/A

Default Value: N/A

attributes

No

String

Definition: Related attributes.

Constraints: N/A

Range:

If the input is a dataset:

  • data_format: data format

  • data_segmentation: data segmentation method

  • dataset_type: data labeling type

Default Value: N/A

Table 31 Output

Parameter

Mandatory

Type

Description

name

Yes

String

Definition: Name of the data output channel.

Constraints: N/A

Range: N/A

Default Value: N/A

description

No

String

Definition: Description of the data output channel.

Constraints: N/A

Range: N/A

Default Value: N/A

local_dir

No

String

Definition: Local path of the container to which the data output channels are mapped.

Constraints: N/A

Range: N/A

Default Value: N/A

access_method

No

String

Definition: Access method of the output data channel path (local_dir).

Constraints: N/A

Range:

  • parameter: hyperparameters

  • env: environment variables

Default Value: parameter

remote

Yes

Remote object

Definition: Description of the actual data output.

Constraints: N/A

Table 32 Remote

Parameter

Mandatory

Type

Description

obs

Yes

RemoteObs object

Definition: Data actually output to OBS.

Constraints: N/A

Table 33 RemoteObs

Parameter

Mandatory

Type

Description

obs_url

Yes

String

Definition: Path of the data output to OBS.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 34 engine

Parameter

Mandatory

Type

Description

engine_id

No

String

Definition: Engine ID selected for an algorithm.

Constraints: N/A

Range: N/A

Default Value: N/A

engine_name

No

String

Definition: Engine name selected for an algorithm.

Constraints: If engine_id is specified, leave this parameter blank.

Range: N/A

Default Value: N/A

engine_version

No

String

Definition: Engine version selected for an algorithm.

Constraints: If engine_id is specified, leave this parameter blank.

Range: N/A

Default Value: N/A

image_url

No

String

Definition: Custom image URL selected for an algorithm.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 35 engine

Parameter

Mandatory

Type

Description

engine_id

No

String

Definition: ID of the engine flavor of a heterogeneous job, for example, caffe-1.0.0-python2.7.

Constraints: N/A

Range: N/A

Default Value: N/A

engine_name

No

String

Definition: Name of the engine flavor of a heterogeneous job, for example, Caffe.

Constraints: N/A

Range: N/A

Default Value: N/A

engine_version

No

String

Definition: Version of the engine flavor of a heterogeneous job.

Constraints: N/A

Range: N/A

Default Value: N/A

image_url

No

String

Definition: Custom image URL selected for an algorithm.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 36 inputs

Parameter

Mandatory

Type

Description

name

Yes

String

Definition: Name of the data input channel.

Constraints: N/A

Range: N/A

Default Value: N/A

description

No

String

Definition: Description of the data input channel.

Constraints: N/A

Range: N/A

Default Value: N/A

local_dir

No

String

Definition: Local path of the container to which the data input channels are mapped.

Constraints: N/A

Range: N/A

Default Value: N/A

remote

Yes

remote object

Definition: Description of the actual data input.

Constraints: The options are as follows:

  • dataset: The data input is a dataset.

  • obs: The data input is an OBS path.

Table 37 remote

Parameter

Mandatory

Type

Description

obs

No

obs object

Definition: OBS in which data input and output are stored.

Constraints: N/A

Table 38 obs

Parameter

Mandatory

Type

Description

obs_url

Yes

String

Definition: OBS URL of the dataset for a training job, For example, /usr/data/.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 39 outputs

Parameter

Mandatory

Type

Description

name

Yes

String

Definition: Name of the data output channel.

Constraints: N/A

Range: N/A

Default Value: N/A

description

No

String

Definition: Description of the data output channel.

Constraints: N/A

Range: N/A

Default Value: N/A

local_dir

No

String

Definition: Local path of the container to which the data output channels are mapped.

Constraints: N/A

Range: N/A

Default Value: N/A

remote

Yes

remote object

Definition: Description of the actual data output.

Constraints: N/A

Table 40 remote

Parameter

Mandatory

Type

Description

obs

Yes

obs object

Definition: Data actually output to OBS.

Constraints: N/A

Table 41 obs

Parameter

Mandatory

Type

Description

obs_url

Yes

String

Definition: Path of the data output to OBS.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 42 task_resource

Parameter

Mandatory

Type

Description

flavor_id

No

String

Definition: ID of the resource flavor selected for a training job.

Constraints: N/A

Range: N/A

Default Value: N/A

node_count

Yes

Integer

Definition: Number of resource replicas selected for a training job.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 43 Spec

Parameter

Mandatory

Type

Description

resource

No

SpecResource object

Definition: Resource flavor of a training job.

Constraints: Select either flavor_id or pool_id or flavor_id.

  • If you select a public resource pool, only flavor_id is needed. Select the number of PUs and memory your training job needs. If the public resource pool has enough idle resources, your job will be scheduled.

  • If you select a dedicated resource pool, both pool_id and flavor_id are needed. Select the smallest number of PUs that meet your training needs to save resources and boost efficiency.

volumes

No

Array of SpecVolumes objects

Definition: Mounting volume information of a training job.

Constraints: N/A

log_export_path

No

LogExportPath object

Definition: Log output of a training job.

Constraints: N/A

auto_stop

No

AutoStop object

Definition: Auto stop configuration of a training job.

Constraints: N/A

schedule_policy

No

SchedulePolicy object

Definition: Scheduling policy of a training job.

Constraints: N/A

notification

No

Notification object

Definition: Message notification of a training event.

Constraints: N/A

custom_metrics

No

Array of CustomMetrics objects

Metric collection configuration.

Table 44 SpecResource

Parameter

Mandatory

Type

Description

flavor_id

No

String

Definition: ID of the resource flavor of a training job.

Constraints: N/A

Range: The flavor_id parameter cannot be specified for a dedicated resource pool of CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows:

  • modelarts.pool.visual.xlarge (1 PU)

  • modelarts.pool.visual.2xlarge (2 PUs)

  • modelarts.pool.visual.4xlarge (4 PUs)

  • modelarts.pool.visual.8xlarge (8 PUs)

  • modelarts.pool.visual.16xlarge (16 cards, only for the Snt9b23 supernode resource pool)

Default Value: N/A

node_count

No

Integer

Definition: Number of nodes used to create a training job in a resource pool.

Constraints: N/A

Range: N/A

Default Value: single node

pool_id

No

String

Definition: Dedicated resource pool ID.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 45 SpecVolumes

Parameter

Mandatory

Type

Description

nfs

No

Nfs object

Definition: NFS mounting volume information of a training job.

Constraints: N/A

pfs

No

Pfs object

Definition: obsfs mounting volume information of a training job.

Constraints: N/A

obs

No

Obs object

Definition: OBS mounting volume information of a training job.

Constraints: N/A

Table 46 Nfs

Parameter

Mandatory

Type

Description

nfs_server_path

No

String

Definition: NFS server path, for example, 10.10.10.10:/example/path.

Constraints: N/A

Range: N/A

Default Value: N/A

local_path

No

String

Definition: Path for attaching volumes to the training container, for example, /example/path.

Constraints: N/A

Range: N/A

Default Value: N/A

read_only

No

Boolean

Definition: Specifies whether the disks attached to the container in NFS mode are read-only.

Constraints: N/A

Range:

  • true: read only

  • false: non-read-only

Default Value: N/A

Table 47 Pfs

Parameter

Mandatory

Type

Description

pfs_path

No

String

Definition: Address of obsfs. For example, /test-bucket/path.

Constraints: N/A

Range: N/A

Default Value: N/A

local_path

No

String

Definition: Path for attaching volumes to the training container, for example, /example/path.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 48 Obs

Parameter

Mandatory

Type

Description

obs_path

No

String

Definition: OBS path to be mounted. For example, /test-bucket/path.

Constraints: N/A

Range: N/A

Default Value: N/A

local_path

No

String

Definition: Path for attaching volumes to the training container, for example, /example/path.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 49 LogExportPath

Parameter

Mandatory

Type

Description

obs_url

No

String

Definition: OBS path for storing training job logs, for example, obs://example/path.

Constraints: N/A

Range: N/A

Default Value: N/A

host_path

No

String

Definition: Path of the host where training job logs are stored, for example, /example/path.

Constraints: N/A

Range: N/A

Default Value: N/A

Table 50 AutoStop

Parameter

Mandatory

Type

Description

time_unit

Yes

String

Definition: Time unit.

Constraints: N/A

Range:

  • HOURS: hour

Default Value: N/A

duration

Yes

Integer

Definition: Runtime.

Constraints: N/A

  • Range: The minimum value is 1.

Default Value: N/A

Table 51 SchedulePolicy

Parameter

Mandatory

Type

Description

required_affinity

No

RequiredAffinity object

Definition: Affinity requirements of a training job.

Constraints: N/A

priority

No

Integer

Definition: Priority of a training job.

Constraints:

  • The priority can be set for a training job only when a dedicated resource pool is used.

  • The value ranges from 1 to 3. The default priority is 1, and the highest priority is 3.

By default, the job priority can be set to 1 or 2. After the permission to set the highest job priority is configured, the priority can be set to 1 to 3.

Range: 0 to 3

Default Value: N/A

preemptible

No

Boolean

Definition: Whether the resource can be preempted.

Constraints: N/A

Range:

  • true: The resource can be preempted.

  • false: The resource cannot be preempted.

Default Value: N/A

Table 52 RequiredAffinity

Parameter

Mandatory

Type

Description

affinity_type

No

String

Definition: Affinity scheduling policy.

Constraints: N/A

Range:

  • cabinet: strong cabinet scheduling

  • hyperinstance: supernode affinity scheduling

Default Value: N/A

affinity_group_size

No

Integer

Definition: Size of an affinity group.

Constraints: This parameter is mandatory when affinity_type is set to hyperinstance. In this case, the system schedules tasks specified by affinity_group_size to a supernode to form an affinity group.

When a user delivers a training job to the supernode resource pool, if the affinity group size is not set, the system sets the value to 1 by default.

Range: N/A

Default Value: 1

Table 53 Notification

Parameter

Mandatory

Type

Description

topic_urn

No

String

Definition: URN of the selected topic in SMN.

Constraints: N/A

Range: N/A

Default Value: N/A

events

No

Array of strings

Definition: Training event that triggers a notification.

Constraints: The options are as follows:

  • JobStarted: The job is started.

  • JobCompleted: The job is completed.

  • JobFailed: The job is failed.

  • JobTerminated: The job is terminated.

  • JobRestarted: The job is restarted.

  • JobHanged: The job is suspended.

  • JobPreempted: The job is preempted.

Table 54 CustomMetrics

Parameter

Mandatory

Type

Description

exec

No

Exec object

Metrics are collected using commands.

http_get

No

HttpGet object

Metrics are collected using HTTP.

Table 55 Exec

Parameter

Mandatory

Type

Description

command

No

Array of strings

Metrics are collected using commands.

Table 56 HttpGet

Parameter

Mandatory

Type

Description

path

No

String

URL for obtaining metrics over HTTP. Both the URL and the port below must either be configured together or remain empty.

port

No

Integer

Port for obtaining metrics over HTTP. This parameter and the URL above must be set or left blank at the same time.

Table 57 JobEndpointsReq

Parameter

Mandatory

Type

Description

ssh

No

SSHReq object

Definition: SSH connection information.

Constraints: N/A

Table 58 SSHReq

Parameter

Mandatory

Type

Description

key_pair_names

No

Array of strings

Definition: Name of the SSH key pair, which can be created and viewed on the Key Pair page of the Elastic Cloud Server (ECS) console.

Constraints: N/A

Response Parameters

Status code: 201

Table 59 Response body parameters

Parameter

Type

Description

kind

String

Definition: Type of a training job.

Range:

  • job: common job

  • edge_job: edge job

  • hetero_job: heterogeneous job

  • mrs_job: MRS job

  • autosearch_job: auto search job

  • diag_job: diagnosis job

  • visualization_job: visualization job

metadata

JobMetadataResponse object

Definition: Training job metadata.

status

Status object

Definition: Training job status information.

algorithm

JobAlgorithmResponse object

Definition: Training job algorithm.

tasks

Array of TaskResponse objects

Definition: Heterogeneous training tasks.

spec

SpecResponce object

Definition: Training job specifications.

endpoints

JobEndpointsResp object

Definition: Configurations required for remotely accessing a training job.

Table 60 JobMetadataResponse

Parameter

Type

Description

id

String

Definition: Training job ID, which is generated and returned by ModelArts after a training job is created.

Range: N/A

name

String

Definition: Name of a training job.

Range: The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-).

workspace_id

String

Definition: Workspace where a specified job is located.

Range: N/A

description

String

Definition: Definition of a training job.

Range: N/A

create_time

Long

Definition: Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created.

Range: N/A

user_name

String

Definition: Username for creating a training job. The username is generated and returned by ModelArts after a training job is created.

Range: N/A

annotations

Map<String,String>

Definition: Advanced functions of a training job.

Table 61 Status

Parameter

Type

Description

phase

String

Definition: Level-1 status of a training job.

Range:

  • Creating: The job is being created.

  • Pending: The job is pending.

  • Running: The job is running.

  • Failed: The job failed to run.

  • Completed: The job is complete.

  • Terminating: The job is being stopped.

  • Terminated: The job has been stopped.

  • Abnormal: The job is abnormal.

secondary_phase

String

Definition: Level-2 status of a training job. The values are internal detailed statuses and may be added, changed, or deleted. Dependency on the status is not recommended.

Range:

  • Creating: The job is being created.

  • Queuing: The job is queuing.

  • Running: The job is running.

  • Failed: The job failed to run.

  • Completed: The job is complete.

  • Terminating: The job is being stopped.

  • Terminated: The job has been stopped.

  • CreateFailed: The job fails to be created.

  • TerminatedFailed: The job fails to be stopped.

  • Unknown: The job is in an unknown state.

  • Lost: The job is abnormal.

duration

Long

Definition: Running duration of a training job, in ms.

Range: N/A

node_count_metrics

Array<Array<Integer>>

Definition: Node quantity change metric during a training job runtime.

tasks

Array of strings

Definition: Training job subtask name.

start_time

Long

Definition: Timestamp when a training job is started.

Range: N/A

task_statuses

Array of TaskStatuses objects

Definition: Training job subtask status.

running_records

Array of RunningRecord objects

Definition: Running and fault recovery records of a training job.

Table 62 TaskStatuses

Parameter

Type

Description

task

String

Definition: Training job subtask name.

Range: N/A

exit_code

Integer

Definition: Exit code of a training job subtask.

Range: N/A

message

String

Definition: Error message of a training job subtask.

Range: N/A

Table 63 RunningRecord

Parameter

Type

Description

start_at

Integer

Definition: Unix timestamp of the start time in the current running record, in seconds.

Range: N/A

end_at

Integer

Definition: Unix timestamp of the end time in the current running record, in seconds.

Range: N/A

start_type

String

Definition: Local running startup mode.

Range:

  • init_or_rescheduled: This startup is the first running after scheduling, including the first startup and the running after scheduling recovery.

  • restarted: This startup is not the first running after scheduling but the running after a process restart.

end_reason

String

Definition: Reason why the running ends.

Range: N/A

end_related_task

String

Definition: ID of the task worker (for example, worker-0) that ends the running.

Range: N/A

end_recover

String

Definition: Fault tolerance policy used after the running ends.

Range:

  • npu_proc_restart: NPU in-place hot recovery

  • gpu_proc_restart: GPU in-place hot recovery

  • proc_restart: Process in-place recovery

  • pod_reschedule: Pod-level rescheduling

  • job_reschedule: Job-level rescheduling

  • job_reschedule_with_taint: Isolated job-level rescheduling

end_recover_before_downgrade

String

Definition: Fault tolerance policy adopted after the running is complete but before the fault tolerance policy is degraded.

Range: same as that of end_recover.

Table 64 JobAlgorithmResponse

Parameter

Type

Description

id

String

Definition: Training job algorithm.

Range:

  • id: Only the algorithm ID is used.

  • subscription_id+item_version_id: The subscription ID and version ID of the algorithm are used.

  • code_dir+boot_file: The code directory and boot file of the training job are used.

name

String

Definition: Algorithm name.

Range: N/A

subscription_id

String

Definition: Subscription ID of a subscription algorithm, which must be used with item_version_id.

Range: N/A

item_version_id

String

Definition: Version of a subscription algorithm, which must be used with subscription_id.

Range: N/A

code_dir

String

Definition: Code directory of a training job, for example, /usr/app/. This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

Range: N/A

boot_file

String

Definition: Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

Range: N/A

autosearch_config_path

String

Definition: YAML configuration path of an auto search job. An OBS URL is required. For example, obs://bucket/file.yaml.

Range: N/A

autosearch_framework_path

String

Definition: Framework code directory of an auto search job. An OBS URL is required. For example, obs://bucket/files/.

Range: N/A

command

String

Definition: Boot command for starting the container of a custom image for a training job. For example, python train.py.

Range: N/A

parameters

Array of ParameterResp objects

Definition: Running parameters of the training job.

policies

policies object

Definition: Policy supported by a job.

inputs

Array of InputResp objects

Definition: Data input of a training job.

outputs

Array of OutputResp objects

Definition: Output of the training job.

engine

JobEngineResp object

Definition: Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm.

local_code_dir

String

Definition: Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows:

  • The directory must be under /home.

  • In v1 compatibility mode, the current field does not take effect.

  • When code_dir is prefixed with file://, the current field does not take effect.

Range: N/A

working_dir

String

Definition: Work directory where an algorithm is executed. Rules:

In v1 compatibility mode, this parameter does not take effect.

Range: N/A

environments

Array of Map<String,String> objects

Definition: Environment variables of a training job. The format is key:value. Leave this parameter blank.

summary

SummaryResp object

Definition: Visualization log summary.

Table 65 ParameterResp

Parameter

Type

Description

name

String

Definition: Parameter name.

Range: N/A

value

String

Definition: Parameter value.

Range: N/A

description

String

Definition: Parameter description.

Range: N/A

constraint

constraint object

Definition: Parameter attribute.

i18n_description

i18n_description object

Definition: Internationalization description.

Table 66 constraint

Parameter

Type

Description

type

String

Definition: Parameter type.

Range: N/A

editable

Boolean

Definition: Whether the parameter can be edited.

Range:

  • true: editable

  • false: Not uneditable

required

Boolean

Definition: Whether the parameter is mandatory.

Range:

  • true: mandatory

  • false: optional

sensitive

Boolean

Definition: Whether the parameter is sensitive. This function is unavailable currently.

Range:

  • true: sensitive

  • false: insensitive

valid_type

String

Definition: Valid type.

Range: N/A

valid_range

Array of strings

Definition: Valid range.

Table 67 i18n_description

Parameter

Type

Description

language

String

Definition: Internationalization language. The options are as follows:

  • zh-cn: Chinese

  • en-us: English

Range: N/A

description

String

Definition: Internationalization language description.

Range: N/A

Table 68 policies

Parameter

Type

Description

auto_search

auto_search object

Definition: Hyperparameter search configuration.

Table 70 reward_attrs

Parameter

Type

Description

name

String

Definition: Metric name.

Range: N/A

mode

String

Definition: Search mode.

Range:

  • max: A larger metric value is preferred.

  • min: A smaller metric value is preferred.

regex

String

Definition: Regular expression of a metric.

Range: N/A

Table 71 search_params

Parameter

Type

Description

name

String

Definition: Hyperparameter name.

Range: N/A

param_type

String

Definition: Parameter type.

Range:

  • continuous: The hyperparameter is of the continuous type. When an algorithm is used in a training job, continuous hyperparameters are displayed as text boxes on the console.

  • discrete: The hyperparameter is of the discrete type. When an algorithm is used in a training job, discrete hyperparameters are displayed as drop-down lists on the console.

lower_bound

String

Definition: Lower bound of the hyperparameter.

Range: N/A

upper_bound

String

Definition: Upper bound of the hyperparameter.

Range: N/A

discrete_points_num

String

Definition: Number of discrete points of a hyperparameter with continuous values.

Range: N/A

discrete_values

Array of strings

Definition: Discrete hyperparameter values.

Table 72 algo_configs

Parameter

Type

Description

name

String

Definition: Search algorithm name.

Range: N/A

params

Array of AutoSearchAlgoConfigParameterResp objects

Definition: Search algorithm parameters.

Table 73 AutoSearchAlgoConfigParameterResp

Parameter

Type

Description

key

String

Definition: Parameter key.

Range: N/A

value

String

Definition: Parameter value.

Range: N/A

type

String

Definition: Parameter type.

Range: N/A

Table 74 InputResp

Parameter

Type

Description

name

String

Definition: Name of the data input channel.

Range: N/A

description

String

Definition: Description of the data input channel.

Range: N/A

local_dir

String

Definition: Local path of the container to which the data input channels are mapped. Example: /home/ma-user/modelarts/inputs/data_url_0

Range: N/A

access_method

String

Definition: Access method of the input data channel path (local_dir).

Range:

  • parameter: hyperparameters

  • env: environment variables

remote

InputDataInfoResp object

Definition: Description of the actual data input.

remote_constraint

Array of remote_constraint objects

Definition: Data input constraint.

Table 75 InputDataInfoResp

Parameter

Type

Description

dataset

dataset object

Definition: The input is a dataset.

obs

obs object

Definition: OBS in which data input and output are stored.

Table 76 dataset

Parameter

Type

Description

id

String

Definition: Dataset ID of a training job.

Range: N/A

version_id

String

Definition: Dataset version ID of a training job.

Range: N/A

obs_url

String

Definition: OBS URL of the dataset for a training job. It is automatically parsed by ModelArts based on the dataset ID and dataset version ID. For example, /usr/data/.

Range: N/A

Table 77 obs

Parameter

Type

Description

obs_url

String

Definition: OBS URL of the dataset for a training job, For example, /usr/data/.

Range: N/A

Table 78 remote_constraint

Parameter

Type

Description

data_type

String

Definition: Data input type, including the data storage location and dataset.

Constraints: N/A

Range: N/A

Default Value: N/A

attributes

String

Definition: Related attributes.

Constraints: N/A

Range:

If the input is a dataset:

  • data_format: data format

  • data_segmentation: data segmentation method

  • dataset_type: data labeling type

Default Value: N/A

Table 79 OutputResp

Parameter

Type

Description

name

String

Definition: Name of the data output channel.

Range: N/A

description

String

Definition: Description of the data output channel.

Range: N/A

local_dir

String

Definition: Local path of the container to which the data output channels are mapped.

Range: N/A

access_method

String

Definition: Access method of the input data channel path (local_dir).

Range:

  • parameter: hyperparameters

  • env: environment variables

remote

RemoteResp object

Definition: Description of the actual data output.

Table 80 JobEngineResp

Parameter

Type

Description

engine_id

String

Definition: Engine ID selected for a training job.

Range: N/A

engine_name

String

Definition: Engine name selected for a training job.

Range: N/A

engine_version

String

Definition: Engine version selected for a training job.

Range: N/A

image_url

String

Definition: Custom image URL selected for a training job. The URL is obtained from SWR.

Range: N/A

install_sys_packages

Boolean

Definition: Specifies whether to install the MoXing version specified by the training platform.

Range:

  • true: yes

  • false: no

Table 81 SummaryResp

Parameter

Type

Description

log_type

String

Definition: Visualization log type of a training job. After this parameter is configured, the training job can be used as the data source of a visualization job.

Range:

  • tensorboard: TensorBoard

  • mindstudio-insight: MindStudio Insight

log_dir

LogDirResp object

Definition: Visualization log output of a training job.

data_sources

Array of DataSourceResp objects

Definition: Visualization log input of the visualization job or training job debugging mode.

Table 82 LogDirResp

Parameter

Type

Description

pfs

PFSSummaryResp object

Definition: Output of an OBS parallel file system.

Table 83 PFSSummaryResp

Parameter

Type

Description

pfs_path

String

Definition: URL of the OBS parallel file system.

Range: N/A

Table 84 DataSourceResp

Parameter

Type

Description

job

JobSummaryResp object

Definition: Job data source.

Table 85 JobSummaryResp

Parameter

Type

Description

job_id

String

Definition: ID of a training job.

Range: N/A

Table 86 TaskResponse

Parameter

Type

Description

role

String

Definition: Task role. This function is not supported currently.

Range: N/A

algorithm

TaskResponseAlgorithm object

Definition: Algorithm configurations for algorithm management.

task_resource

FlavorResponse object

Definition: Specifications of a training job or algorithm.

Table 87 TaskResponseAlgorithm

Parameter

Type

Description

code_dir

String

Definition: Absolute path of the directory where the algorithm boot file is stored.

Range: N/A

boot_file

String

Definition: Absolute path of an algorithm boot file.

Range: N/A

inputs

AlgorithmInput object

Definition: Information about the algorithm input channel.

outputs

AlgorithmOutput object

Definition: Information about the algorithm output channel.

engine

AlgorithmEngine object

Definition: Engine that a heterogeneous job depends on.

local_code_dir

String

Definition: Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows:

  • The directory must be under /home.

  • In v1 compatibility mode, the current field does not take effect.

  • When code_dir is prefixed with file://, the current field does not take effect.

Range: N/A

working_dir

String

Definition: Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode.

Range: N/A

Table 88 AlgorithmInput

Parameter

Type

Description

name

String

Definition: Name of the data input channel.

Range: N/A

local_dir

String

Definition: Local path of the container to which the data input and output channels are mapped.

Range: N/A

remote

AlgorithmRemote object

Definition: Actual data input, which can only be OBS for heterogeneous jobs.

Table 89 AlgorithmRemote

Parameter

Type

Description

obs

RemoteObsResp object

Definition: OBS in which data input and output are stored.

Table 90 AlgorithmOutput

Parameter

Type

Description

name

String

Definition: Name of the data output channel.

Range: N/A

local_dir

String

Definition: Local path of the container to which the data output channels are mapped.

Range: N/A

remote

RemoteResp object

Definition: Description of the actual data output.

mode

String

Definition: Data transmission mode. The default value is upload_periodically.

Range: N/A

period

String

Definition: Data transmission period. The default value is 30s.

Range: N/A

Table 91 RemoteResp

Parameter

Type

Description

obs

RemoteObsResp object

Definition: Data actually output to OBS.

Table 92 RemoteObsResp

Parameter

Type

Description

obs_url

String

Definition: Path of the data output to OBS.

Range: N/A

Table 93 AlgorithmEngine

Parameter

Type

Description

engine_id

String

Definition: Engine flavor ID, for example, caffe-1.0.0-python2.7.

Range: N/A

engine_name

String

Definition: Engine flavor name, for example, Caffe.

Range: N/A

engine_version

String

Definition: Engine flavor version. Engines with the same name have multiple versions, for example, Caffe-1.0.0-python2.7 of Python 2.7.

Range: N/A

v1_compatible

Boolean

Definition: Specifies whether the v1 compatibility mode is used.

Range:

  • true: The v1 compatibility mode is used.

  • false: The v1 compatibility mode is not used.

run_user

String

Definition: Default UID for the engine startup.

Range: N/A

image_url

String

Definition: Custom image URL selected for an algorithm.

Range: N/A

Table 94 FlavorResponse

Parameter

Type

Description

flavor_id

String

Definition: Resource flavor ID.

Range: N/A

flavor_name

String

Definition: Resource flavor name.

Range: N/A

max_num

Integer

Definition: Maximum number of nodes supported by a flavor.

Range: N/A

flavor_type

String

Definition: Resource flavor type.

Range:

  • CPU

  • GPU

  • Ascend

billing

BillingInfo object

Definition: Billing information of a resource flavor.

flavor_info

FlavorInfoResponse object

Definition: Resource flavor details.

attributes

Map<String,String>

Definition: Other flavor attributes.

Range: N/A

Table 95 FlavorInfoResponse

Parameter

Type

Description

max_num

Integer

Definition: Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported.

Range: N/A

cpu

Cpu object

Definition: CPU specifications.

gpu

Gpu object

Definition: GPU specifications.

npu

Npu object

Definition: Ascend specifications.

memory

Memory object

Definition: Memory information.

disk

DiskResponse object

Definition: Disk information.

Table 96 DiskResponse

Parameter

Type

Description

size

Integer

Definition: Disk size.

Range: N/A

unit

String

Definition: Unit of the disk size.

Range: N/A

Table 97 SpecResponce

Parameter

Type

Description

resource

Resource object

Definition: Resource flavor of a training job. Select either flavor_id or pool_id and flavor_id.

volumes

Array of JobVolumeResp objects

Definition: Mounting volume information of a training job.

log_export_path

LogExportPathResp object

Definition: Log output of a training job.

schedule_policy

SchedulePolicyResp object

Definition: Scheduling policy of a training job.

custom_metrics

Array of CustomMetrics objects

Metric collection configuration

Table 98 Resource

Parameter

Type

Description

policy

String

Definition: Resource flavor mode of a training job.

Range:

  • regular: standard mode

flavor_id

String

Definition: ID of the resource flavor of a training job.

Range: The flavor_id parameter cannot be specified for a dedicated resource pool of CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows:

  • modelarts.pool.visual.xlarge (1 PU)

  • modelarts.pool.visual.2xlarge (2 PUs)

  • modelarts.pool.visual.4xlarge (4 PUs)

  • modelarts.pool.visual.8xlarge (8 PUs)

flavor_name

String

Definition: Read-only flavor name returned by ModelArts when flavor_id is used.

Range: N/A

node_count

Integer

Definition: Number of resource replicas selected for a training job.

Range: N/A

pool_id

String

Definition: ID of the resource pool selected for a training job.

Range: N/A

flavor_detail

FlavorDetail object

Definition: Flavor details of a training job or algorithm. This parameter is available only for public resource pools.

main_container_allocated_resources

MainContainerAllocatedResources object

Resource specifications actually obtained by the training container of a training job.

Table 99 FlavorDetail

Parameter

Type

Description

flavor_type

String

Definition: Resource flavor type.

Range:

  • CPU

  • GPU

  • Ascend

billing

BillingInfo object

Definition: Billing information of a resource flavor.

flavor_info

FlavorInfo object

Definition: Resource flavor details.

Table 100 BillingInfo

Parameter

Type

Description

code

String

Definition: Billing code.

Range: N/A

unit_num

Integer

Definition: Billing unit.

Range: N/A

Table 101 FlavorInfo

Parameter

Type

Description

max_num

Integer

Definition: Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported.

Range: N/A

cpu

Cpu object

Definition: CPU specifications.

gpu

Gpu object

Definition: GPU specifications.

npu

Npu object

Definition: Ascend specifications.

memory

Memory object

Definition: Memory information.

disk

Disk object

Definition: Disk information.

Table 102 Cpu

Parameter

Type

Description

arch

String

Definition: CPU architecture.

Range: N/A

core_num

Integer

Definition: Number of cores.

Range: N/A

Table 103 Gpu

Parameter

Type

Description

unit_num

Integer

Definition: Number of GPUs.

Range: N/A

product_name

String

Definition: Product name.

Range: N/A

memory

String

Definition: Memory.

Range: N/A

Table 104 Npu

Parameter

Type

Description

unit_num

String

Definition: Number of NPUs.

Range: N/A

product_name

String

Definition: Product name.

Range: N/A

memory

String

Definition: Memory.

Range: N/A

Table 105 Memory

Parameter

Type

Description

size

Integer

Definition: Memory size.

Range: N/A

unit

String

Definition: Number of memory units.

Range: N/A

Table 106 Disk

Parameter

Type

Description

size

String

Definition: Disk size.

Range: N/A

unit

String

Definition: Unit of the disk size. Generally, the unit is GB.

Range: N/A

Table 107 MainContainerAllocatedResources

Parameter

Type

Description

cpu_arch

String

CPU architecture.

cpu_core_num

Float

Number of cores.

mem_size

Float

Memory information.

accelerator_num

Float

Number of accelerator cards.

accelerator_type

String

Accelerator card type.

Table 108 JobVolumeResp

Parameter

Type

Description

nfs

NfsResp object

Definition: Volumes attached in NFS mode.

Table 109 NfsResp

Parameter

Type

Description

nfs_server_path

String

Definition: NFS server path, for example, 10.10.10.10:/example/path.

Range: N/A

local_path

String

Definition: Path for attaching volumes to the training container, for example, /example/path.

Range: N/A

read_only

Boolean

Definition: Specifies whether the disks attached to the container in NFS mode are read-only.

Range:

  • true: read only

  • false: non-read-only

Table 110 LogExportPathResp

Parameter

Type

Description

obs_url

String

Definition: OBS path for storing training job logs, for example, obs://example/path.

Range: N/A

host_path

String

Definition: Path of the host where training job logs are stored, for example, /example/path.

Range: N/A

Table 111 SchedulePolicyResp

Parameter

Type

Description

required_affinity

RequiredAffinityResp object

Definition: Affinity requirements of a training job.

priority

Integer

Definition: Priority of a training job.

Range: 0 to 3

preemptible

Boolean

Definition: Whether the resource can be preempted.

Range:

  • true: The resource can be preempted.

  • false: The resource cannot be preempted.

Table 112 RequiredAffinityResp

Parameter

Type

Description

affinity_type

String

Definition: Affinity scheduling policy.

Range:

  • cabinet: strong cabinet scheduling

  • hyperinstance: supernode affinity scheduling

affinity_group_size

Integer

Definition: Size of an affinity group.

Range: N/A

Table 113 CustomMetrics

Parameter

Type

Description

exec

Exec object

Metrics are collected using commands.

http_get

HttpGet object

Metrics are collected using HTTP.

Table 114 Exec

Parameter

Type

Description

command

Array of strings

Metrics are collected using commands.

Table 115 HttpGet

Parameter

Type

Description

path

String

URL for obtaining metrics over HTTP. Both the URL and the port below must either be configured together or remain empty.

port

Integer

Port for obtaining metrics over HTTP. This parameter and the URL above must be set or left blank at the same time.

Table 116 JobEndpointsResp

Parameter

Type

Description

ssh

SSHResp object

Definition: SSH connection information.

jupyter_lab

JupyterLab object

Definition: JupyterLab connection information.

tensorboard

Tensorboard object

Definition: TensorBoard connection information.

mindstudio_insight

MindStudioInsight object

Definition: MindStudio Insight connection information.

Table 117 SSHResp

Parameter

Type

Description

key_pair_names

Array of strings

Definition: Name of the SSH key pair, which can be created and viewed on the Key Pair page of the Elastic Cloud Server (ECS) console.

Range: N/A

task_urls

Array of TaskUrls objects

Definition: SSH connection address.

Table 118 TaskUrls

Parameter

Type

Description

task

String

Definition: Task ID of a training job.

Range: N/A

url

String

Definition: SSH connection address of a training job.

Range: N/A

Table 119 JupyterLab

Parameter

Type

Description

url

String

Definition: JupyterLab address of a training job.

Range: N/A

token

String

Definition: JupyterLab token of a training job.

Range: N/A

Table 120 Tensorboard

Parameter

Type

Description

url

String

Definition: TensorBoard address of a training job.

Range: N/A

token

String

Definition: TensorBoard token of a training job.

Range: N/A

Table 121 MindStudioInsight

Parameter

Type

Description

url

String

Definition: MindStudio Insight address of a training job.

Range: N/A

token

String

Definition: MindStudio Insight token of a training job.

Range: N/A

Status code: 400

Table 122 Response body parameters

Parameter

Type

Description

error_msg

String

Error message

error_code

String

Error code

error_solution

String

Solution

Example Requests

  • The following is an example of how to create a training job with free specifications. The job name has been set to TestModelArtsJob and the description has been set to This is a ModelArts job. The required algorithm's ID is 3f5d6706-7b67-408d-8ba0-ec08048c45ed. The inputs and outputs have not been defined for the algorithm.

    POST https://endpoint/v2/{project_id}/training-jobs
    
    {
      "kind" : "job",
      "metadata" : {
        "id" : "425b7087-83de-49ed-9e40-5bb642be956f",
        "name" : "TestModelArtsJob",
        "description" : "This is a ModelArts job",
        "create_time" : 1637045545982,
        "workspace_id" : "0",
        "user_name" : ""
      },
      "algorithm" : {
        "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed",
        "name" : "ttt-obs-gpu",
        "code_dir" : "/cn-north-4-rse/test/moxingtest-code/",
        "boot_file" : "/cn-north-4-rse/test/moxingtest-code/test_obs_gpu.py",
        "parameters" : [ {
          "name" : "input_dir",
          "description" : "",
          "i18n_description" : null,
          "value" : "s://cn-north-4-rse/test/moxingtest-dir/",
          "constraint" : {
            "type" : "String",
            "editable" : true,
            "required" : true,
            "sensitive" : false,
            "valid_type" : "None",
            "valid_range" : [ ]
          }
        }, {
          "name" : "input_file",
          "description" : "",
          "i18n_description" : null,
          "value" : "obs://cn-north-4-rse/test/moxingtest/",
          "constraint" : {
            "type" : "String",
            "editable" : true,
            "required" : true,
            "sensitive" : false,
            "valid_type" : "None",
            "valid_range" : [ ]
          }
        }, {
          "name" : "large_file_method",
          "description" : "",
          "i18n_description" : null,
          "value" : "1",
          "constraint" : {
            "type" : "Integer",
            "editable" : true,
            "required" : true,
            "sensitive" : false,
            "valid_type" : "None",
            "valid_range" : [ ]
          }
        } ],
        "engine" : {
          "engine_id" : "horovod-cp36-tf-1.16.2",
          "engine_name" : "Horovod",
          "engine_version" : "0.16.2-TF-1.13.1-python3.6"
        },
        "policies" : { }
      },
      "spec" : {
        "resource" : {
          "flavor_id" : "modelarts.p3.large.public.free",
          "node_count" : 1
        },
        "log_export_path" : { },
        "custom_metrics" : [ {
          "http_get" : {
            "path" : "/raw_text",
            "port" : 10001
          }
        } ]
      }
    }
  • The following is an example of how to use a custom image to create a training job whose name is TestModelArtsJob2 and description is This is a ModelArts job2. A dedicated resource pool and NFS mounting are used.

    POST https://endpoint/v2/{project_id}/training-jobs
    
    {
      "kind" : "job",
      "metadata" : {
        "name" : "TestModelArtsJob2",
        "description" : "This is a ModelArts job2"
      },
      "algorithm" : {
        "engine" : {
          "image_url" : "xxxxxxxx/fastseq:1.2"
        },
        "command" : "cd /home/ma-user/ddp_demo && sh run_ddp.sh",
        "parameters" : [ ],
        "policies" : {
          "auto_search" : null
        },
        "environments" : {
          "NCCL_DEBUG" : "INFO",
          "NCCL_IB_DISABLE" : "0"
        }
      },
      "spec" : {
        "resource" : {
          "flavor_id" : "modelarts.pool.visual.xlarge",
          "node_count" : 1,
          "pool_id" : "poolfaf38d76"
        },
        "log_export_path" : {
          "obs_url" : "/cn-north-4-training-test/limou/ddp-demo-log/"
        },
        "volumes" : [ {
          "nfs" : {
            "nfs_server_path" : "192.168.0.82:/",
            "local_path" : "/home/ma-user/nfs/",
            "read_only" : false
          }
        } ]
      }
    }

Example Responses

Status code: 201

ok

{
  "kind" : "job",
  "metadata" : {
    "id" : "425b7087-83de-49ed-9e40-5bb642be956f",
    "name" : "TestModelArtsJob",
    "description" : "This is a ModelArts job",
    "create_time" : 1637045545982,
    "workspace_id" : "0",
    "user_name" : ""
  },
  "status" : {
    "phase" : "Creating",
    "secondary_phase" : "Creating",
    "duration" : 0,
    "start_time" : 0,
    "node_count_metrics" : null,
    "tasks" : [ "worker-0", "server-0" ]
  },
  "algorithm" : {
    "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed",
    "name" : "ttt-obs-gpu",
    "code_dir" : "/cn-north-4-rse/test/moxingtest-code/",
    "boot_file" : "/cn-north-4-rse/test/moxingtest-code/test_obs_gpu.py",
    "parameters" : [ {
      "name" : "input_dir",
      "description" : "",
      "i18n_description" : null,
      "value" : "s://cn-north-4-rse/test/moxingtest-dir/",
      "constraint" : {
        "type" : "String",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    }, {
      "name" : "input_file",
      "description" : "",
      "i18n_description" : null,
      "value" : "obs://cn-north-4-rse/test/moxingtest/",
      "constraint" : {
        "type" : "String",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    }, {
      "name" : "large_file_method",
      "description" : "",
      "i18n_description" : null,
      "value" : "1",
      "constraint" : {
        "type" : "Integer",
        "editable" : true,
        "required" : true,
        "sensitive" : false,
        "valid_type" : "None",
        "valid_range" : [ ]
      }
    } ],
    "engine" : {
      "engine_id" : "horovod-cp36-tf-1.16.2",
      "engine_name" : "Horovod",
      "engine_version" : "0.16.2-TF-1.13.1-python3.6"
    },
    "policies" : { }
  },
  "spec" : {
    "resource" : {
      "policy" : "regular",
      "flavor_id" : "modelarts.p3.large.public.free",
      "flavor_name" : "Computing GPU(Vnt1) instance",
      "node_count" : 1,
      "flavor_detail" : {
        "flavor_type" : "GPU",
        "billing" : {
          "code" : "modelarts.vm.gpu.free",
          "unit_num" : 1
        },
        "flavor_info" : {
          "cpu" : {
            "arch" : "x86",
            "core_num" : 8
          },
          "gpu" : {
            "unit_num" : 1,
            "product_name" : "GP-Vnt1",
            "memory" : "32GB"
          },
          "memory" : {
            "size" : 64,
            "unit" : "GB"
          }
        }
      },
      "main_container_allocated_resources" : {
        "cpu_arch" : "x86",
        "cpu_core_num" : 5,
        "mem_size" : 44,
        "accelerator_num" : 1,
        "accelerator_type" : "nvidia-v100-pcie32"
      }
    },
    "log_export_path" : { },
    "custom_metrics" : [ {
      "exec" : {
        "command" : [ "cat", "/a/b/c.porm" ]
      }
    }, {
      "http_get" : {
        "path" : "/raw_text",
        "port" : 10001
      }
    } ]
  }
}

Status code: 400

Format of the body for a common error response. The following shows the returned information when an algorithm with ID 3f5d6706-7b67-408d-8ba0-ec08048c45ee is not found.

{
  "error_msg" : "algorithm not found.",
  "error_code" : "ModelArts.2755",
  "error_solution" : "Check whether the training project information in the request is valid."
}

Status Codes

Status Code

Description

201

ok

400

Format of the body for a common error response. The following shows the returned information when an algorithm with ID 3f5d6706-7b67-408d-8ba0-ec08048c45ee is not found.

Error Codes

See Error Codes.