Creating a Training Job
Function
This API is used to create a training job on ModelArts.
This API applies to the following scenarios: When you need to perform machine learning training based on specific datasets and algorithm models, you can use this API to create and configure a training job. Before using this API, ensure that you have uploaded datasets and model code to ModelArts and have the permission to create training jobs. After a training job is created, the platform starts the training job based on the configured resource specifications. You can monitor the training progress and status by using the job ID. If the dataset or model code does not exist, the resource specifications are incorrectly configured, or you do not have the required permission, the API will return an error message.
Debugging
You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.
URI
POST /v2/{project_id}/training-jobs
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Definition: Project ID. For details, see Obtaining a Project ID and Name. Constraints: The value can contain 1 to 64 characters. Letters, digits, and hyphens (-) are allowed. Range: N/A Default Value: N/A |
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
kind |
Yes |
String |
Definition: Type of a training job. Constraints: N/A Range: Default Value: job |
metadata |
Yes |
JobMetadata object |
Definition: Training job metadata. Constraints: N/A |
algorithm |
No |
JobAlgorithm object |
Definition: Training job algorithm. Constraints: The options are as follows. |
tasks |
No |
Array of Task objects |
Definition: Task list. This function is not implemented. Constraints: N/A |
spec |
No |
Spec object |
Definition: Training job specifications. If this parameter is specified, leave the tasks parameter blank. Constraints: N/A |
endpoints |
No |
JobEndpointsReq object |
Definition: Configurations required for remotely accessing a training job. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Definition: Name of a training job. Constraints: N/A Range: The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). Default Value: N/A |
workspace_id |
No |
String |
Definition: Workspace where a specified job is located. Constraints: N/A Range: N/A Default Value: 0 |
description |
No |
String |
Definition: Definition of a training job. Constraints: The value must contain 0 to 256 characters. Range: N/A Default Value: NULL |
annotations |
No |
Map<String,String> |
Definition: Advanced functions of a training job. Constraints: The options are as follows. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
No |
String |
Definition: Algorithm ID in algorithm management. Constraints: N/A Range: N/A Default Value: N/A |
name |
No |
String |
Definition: Algorithm name. Leave it blank. Constraints: N/A Range: N/A Default Value: N/A |
subscription_id |
No |
String |
Definition: Subscription ID of a subscription algorithm. Constraints: This parameter must be used with item_version_id. Range: N/A Default Value: N/A |
item_version_id |
No |
String |
Definition: Version of a subscription algorithm. Constraints: This parameter must be used with subscription_id. Range: N/A Default Value: N/A |
code_dir |
No |
String |
Definition: Code directory of a training job, for example, /usr/app/. Constraints: This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified. Range: N/A Default Value: N/A |
boot_file |
No |
String |
Definition: Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. Constraints: This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified. Range: N/A Default Value: N/A |
autosearch_config_path |
No |
String |
Definition: YAML configuration path of an auto search job. An OBS URL is required. Constraints: N/A Range: N/A Default Value: N/A |
autosearch_framework_path |
No |
String |
Definition: Framework code directory of an auto search job. An OBS URL is required. Constraints: N/A Range: N/A Default Value: N/A |
command |
No |
String |
Definition: Command for starting the custom image container of a training job. Constraints: N/A Range: N/A Default Value: N/A |
parameters |
No |
Array of Parameters objects |
Definition: Running parameters of the training job. Constraints: N/A |
policies |
No |
JobPolicies object |
Definition: Policies supported by jobs, which are used for hyperparameter search. Constraints: N/A |
inputs |
No |
Array of Input objects |
Definition: Data input of a training job. Constraints: N/A |
outputs |
No |
Array of Output objects |
Definition: Output of the training job. Constraints: N/A |
engine |
No |
JobEngine object |
Definition: Engine of a training job. Constraints: Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm. |
local_code_dir |
No |
String |
Definition: Local directory of the training container to which the algorithm code directory is downloaded. Constraints:
Range: N/A Default Value: N/A |
working_dir |
No |
String |
Definition: Work directory where an algorithm is executed. Constraints: In v1 compatibility mode, the current field does not take effect. Range: N/A Default Value: N/A |
environments |
No |
Map<String,String> |
Definition: Environment variables of a training job. Format: "key":"value" Constraints: The key can contain a maximum of 8,192 characters, and the value can contain a maximum of 4,096 characters. A maximum of 100 key-value pairs are allowed. The variable name can contain only letters, digits, and underscores (), and must start with a letter or underscore (). Note: Variables cannot contain $. |
summary |
No |
Summary object |
Definition: Visualization log summary. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Definition: Parameter name. Constraints: N/A Range: N/A Default Value: N/A |
value |
No |
String |
Definition: Parameter value. Constraints: N/A Range: N/A Default Value: N/A |
description |
No |
String |
Definition: Parameter description. Constraints: N/A Range: N/A Default Value: N/A |
constraint |
No |
ParametersConstraint object |
Definition: Parameter attribute. Constraints: N/A |
i18n_description |
No |
I18nDescription object |
Definition: Internationalization description. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
type |
No |
String |
Definition: Parameter type. Constraints: N/A Range: N/A Default Value: N/A |
editable |
No |
Boolean |
Definition: Whether the parameter can be edited. Constraints: N/A Range: Default Value: N/A |
required |
No |
Boolean |
Definition: Whether the parameter is mandatory. Constraints: N/A Range: Default Value: N/A |
sensitive |
No |
Boolean |
Definition: Whether the parameter is sensitive. This function is unavailable currently. Constraints: N/A Range: Default Value: N/A |
valid_type |
No |
String |
Definition: Valid type. Constraints: N/A Range: N/A Default Value: N/A |
valid_range |
No |
Array of strings |
Definition: Valid range. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
language |
No |
String |
Definition: Internationalization language. Constraints: N/A Range: N/A Default Value: N/A |
description |
No |
String |
Definition: Description. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
auto_search |
No |
AutoSearch object |
Definition: Hyperparameter search configuration. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
skip_search_params |
No |
String |
Definition: Hyperparameter parameters that need to be skipped. Constraints: N/A Range: N/A Default Value: N/A |
reward_attrs |
No |
Array of RewardAttrs objects |
Definition: Search metrics. Constraints: N/A |
search_params |
No |
Array of SearchParams objects |
Definition: Search parameters. Constraints: N/A |
algo_configs |
No |
Array of AlgoConfigs objects |
Definition: Search algorithm configurations. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Definition: Metric name. Constraints: N/A Range: N/A Default Value: N/A |
mode |
No |
String |
Definition: Search mode. Constraints: N/A Range: Default Value: N/A |
regex |
No |
String |
Definition: Regular expression of a metric. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Definition: Search algorithm name. Constraints: N/A Range: N/A Default Value: N/A |
params |
No |
Array of AutoSearchAlgoConfigParameter objects |
Definition: Search algorithm parameters. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
key |
No |
String |
Definition: Parameter key. Constraints: N/A Range: N/A Default Value: N/A |
value |
No |
String |
Definition: Parameter value. Constraints: N/A Range: N/A Default Value: N/A |
type |
No |
String |
Definition: parameter type. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
engine_id |
No |
String |
Definition: Engine ID selected for a training job. Constraints: The value can be engine_id, engine_name + engine_version, or image_url. Range: N/A Default Value: N/A |
engine_name |
No |
String |
Definition: Engine name selected for a training job. Constraints: If engine_id has been set, you do not need to set this parameter. If you use a preset framework and custom image to create a training job, you must set both this parameter and image_url. Range: N/A Default Value: N/A |
engine_version |
No |
String |
Definition: Engine version selected for a training job. Constraints: If engine_id has been set, you do not need to set this parameter. Range: N/A Default Value: N/A |
image_url |
No |
String |
Definition: Custom image URL selected for a training job. The URL is obtained from SWR. Constraints: The format is organization_name/image_name:tag. Range: N/A Default Value: N/A |
install_sys_packages |
No |
Boolean |
Definition: Specifies whether to install the MoXing version specified by the training platform. Constraints: This parameter is available only when engine_name, engine_version, and image_url are set. Range: Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
log_type |
No |
String |
Definition: Visualization log type of a training job. After this parameter is configured, the training job can be used as the data source of a visualization job. Constraints: N/A Range: Default Value: N/A |
log_dir |
No |
LogDir object |
Definition: Visualization log output of a training job. Constraints: This parameter is mandatory when log_type is not left empty. |
data_sources |
No |
Array of DataSource objects |
Definition: Visualization log input of the visualization job or training job debugging mode. Constraints: This parameter is mandatory when the advanced function "tensorboard/enable": "true" or "mindstudio-insight/enable": "true" is enabled for the training job. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
pfs |
Yes |
PFSSummary object |
Definition: Output of an OBS parallel file system. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
pfs_path |
Yes |
String |
Definition: URL of the OBS parallel file system. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
job |
Yes |
JobSummary object |
Definition: Job data source. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
job_id |
Yes |
String |
Definition: ID of a training job. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
role |
No |
String |
Definition: Task role. This function is not supported currently. Constraints: N/A Range: N/A Default Value: N/A |
algorithm |
No |
algorithm object |
Definition: Algorithm configurations for algorithm management. Constraints: N/A |
task_resource |
No |
task_resource object |
Definition: Resource flavor of a training job. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
job_config |
No |
job_config object |
Definition: Algorithm configuration, such as the boot file. Constraints: N/A |
code_dir |
No |
String |
Definition: Algorithm code directory, for example, /usr/app/. Constraints: This parameter must be used with boot_file. Range: N/A Default Value: N/A |
boot_file |
No |
String |
Definition: Code boot file of the algorithm, which must be stored in the code directory, for example, /usr/app/boot.py. Constraints: This parameter must be used with code_dir. Range: N/A Default Value: N/A |
engine |
No |
engine object |
Definition: Algorithm engine of a heterogeneous job. Constraints: N/A |
inputs |
No |
Array of inputs objects |
Definition: Data input of an algorithm. Constraints: N/A |
outputs |
No |
Array of outputs objects |
Definition: Data output of an algorithm. Constraints: N/A |
local_code_dir |
No |
String |
Definition: Local directory of the training container to which the algorithm code directory is downloaded. Constraints:
Range: N/A Default Value: N/A |
working_dir |
No |
String |
Definition: Work directory where an algorithm is executed. Constraints: In v1 compatibility mode, the current field does not take effect. Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
parameters |
No |
Array of Parameter objects |
Definition: Running parameters of an algorithm. Constraints: N/A |
inputs |
No |
Array of Input objects |
Definition: Data input of an algorithm. Constraints: N/A |
outputs |
No |
Array of Output objects |
Definition: Data output of an algorithm. Constraints: N/A |
engine |
No |
engine object |
Definition: Algorithm engine. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Definition: Parameter name. Constraints: N/A Range: N/A Default Value: N/A |
value |
No |
String |
Definition: Parameter value. Constraints: N/A Range: N/A Default Value: N/A |
description |
No |
String |
Definition: Parameter description. Constraints: N/A Range: N/A Default Value: N/A |
constraint |
No |
constraint object |
Definition: Parameter attribute. Constraints: N/A |
i18n_description |
No |
i18n_description object |
Definition: Internationalization description. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
type |
No |
String |
Definition: Parameter type. Constraints: N/A Range: N/A Default Value: N/A |
editable |
No |
Boolean |
Definition: Whether the parameter can be edited. Constraints: N/A Range: Default Value: N/A |
required |
No |
Boolean |
Definition: Whether the parameter is mandatory. Constraints: N/A Range: Default Value: N/A |
sensitive |
No |
Boolean |
Definition: Whether the parameter is sensitive. Constraints: This function is unavailable currently. Range: Default Value: N/A |
valid_type |
No |
String |
Definition: Valid type. Constraints: N/A Range: N/A Default Value: N/A |
valid_range |
No |
Array of strings |
Definition: Valid range. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
language |
No |
String |
Definition: Internationalization language. Constraints: N/A Range: N/A Default Value: N/A |
description |
No |
String |
Definition: Internationalization language description. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Definition: Name of the data input channel. Constraints: N/A Range: N/A Default Value: N/A |
description |
No |
String |
Definition: Description of the data input channel. Constraints: N/A Range: N/A Default Value: N/A |
local_dir |
No |
String |
Definition: Local path of the container to which the data input channels are mapped. Example: /home/ma-user/modelarts/inputs/data_url_0 Constraints: N/A Range: N/A Default Value: N/A |
access_method |
No |
String |
Definition: Access method of the input data channel path (local_dir). Constraints: N/A Range: Default Value: parameter |
remote |
Yes |
InputDataInfo object |
Definition: Description of the actual data input. Constraints: The options are as follows. |
remote_constraint |
No |
Array of remote_constraint objects |
Definition: Data input constraint. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
dataset |
No |
dataset object |
Definition: The input is a dataset. Constraints: N/A |
obs |
No |
obs object |
Definition: OBS in which data input and output are stored. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
Yes |
String |
Definition: Dataset ID of a training job. Constraints: N/A Range: N/A Default Value: N/A |
version_id |
Yes |
String |
Definition: Dataset version ID of a training job. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
Yes |
String |
Definition: OBS URL of the dataset for a training job, For example, /usr/data/. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_type |
No |
String |
Definition: Data input type, including the data storage location and dataset. Constraints: N/A Range: N/A Default Value: N/A |
attributes |
No |
String |
Definition: Related attributes. Constraints: N/A Range: If the input is a dataset: Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Definition: Name of the data output channel. Constraints: N/A Range: N/A Default Value: N/A |
description |
No |
String |
Definition: Description of the data output channel. Constraints: N/A Range: N/A Default Value: N/A |
local_dir |
No |
String |
Definition: Local path of the container to which the data output channels are mapped. Constraints: N/A Range: N/A Default Value: N/A |
access_method |
No |
String |
Definition: Access method of the output data channel path (local_dir). Constraints: N/A Range: Default Value: parameter |
remote |
Yes |
Remote object |
Definition: Description of the actual data output. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs |
Yes |
RemoteObs object |
Definition: Data actually output to OBS. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
Yes |
String |
Definition: Path of the data output to OBS. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
engine_id |
No |
String |
Definition: Engine ID selected for an algorithm. Constraints: N/A Range: N/A Default Value: N/A |
engine_name |
No |
String |
Definition: Engine name selected for an algorithm. Constraints: If engine_id is specified, leave this parameter blank. Range: N/A Default Value: N/A |
engine_version |
No |
String |
Definition: Engine version selected for an algorithm. Constraints: If engine_id is specified, leave this parameter blank. Range: N/A Default Value: N/A |
image_url |
No |
String |
Definition: Custom image URL selected for an algorithm. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
engine_id |
No |
String |
Definition: ID of the engine flavor of a heterogeneous job, for example, caffe-1.0.0-python2.7. Constraints: N/A Range: N/A Default Value: N/A |
engine_name |
No |
String |
Definition: Name of the engine flavor of a heterogeneous job, for example, Caffe. Constraints: N/A Range: N/A Default Value: N/A |
engine_version |
No |
String |
Definition: Version of the engine flavor of a heterogeneous job. Constraints: N/A Range: N/A Default Value: N/A |
image_url |
No |
String |
Definition: Custom image URL selected for an algorithm. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Definition: Name of the data input channel. Constraints: N/A Range: N/A Default Value: N/A |
description |
No |
String |
Definition: Description of the data input channel. Constraints: N/A Range: N/A Default Value: N/A |
local_dir |
No |
String |
Definition: Local path of the container to which the data input channels are mapped. Constraints: N/A Range: N/A Default Value: N/A |
remote |
Yes |
remote object |
Definition: Description of the actual data input. Constraints: The options are as follows: |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs |
No |
obs object |
Definition: OBS in which data input and output are stored. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
Yes |
String |
Definition: OBS URL of the dataset for a training job, For example, /usr/data/. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Definition: Name of the data output channel. Constraints: N/A Range: N/A Default Value: N/A |
description |
No |
String |
Definition: Description of the data output channel. Constraints: N/A Range: N/A Default Value: N/A |
local_dir |
No |
String |
Definition: Local path of the container to which the data output channels are mapped. Constraints: N/A Range: N/A Default Value: N/A |
remote |
Yes |
remote object |
Definition: Description of the actual data output. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs |
Yes |
obs object |
Definition: Data actually output to OBS. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
Yes |
String |
Definition: Path of the data output to OBS. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
flavor_id |
No |
String |
Definition: ID of the resource flavor selected for a training job. Constraints: N/A Range: N/A Default Value: N/A |
node_count |
Yes |
Integer |
Definition: Number of resource replicas selected for a training job. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
resource |
No |
SpecResource object |
Definition: Resource flavor of a training job. Constraints: Select either flavor_id or pool_id or flavor_id.
|
volumes |
No |
Array of SpecVolumes objects |
Definition: Mounting volume information of a training job. Constraints: N/A |
log_export_path |
No |
LogExportPath object |
Definition: Log output of a training job. Constraints: N/A |
auto_stop |
No |
AutoStop object |
Definition: Auto stop configuration of a training job. Constraints: N/A |
schedule_policy |
No |
SchedulePolicy object |
Definition: Scheduling policy of a training job. Constraints: N/A |
notification |
No |
Notification object |
Definition: Message notification of a training event. Constraints: N/A |
custom_metrics |
No |
Array of CustomMetrics objects |
Metric collection configuration. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
flavor_id |
No |
String |
Definition: ID of the resource flavor of a training job. Constraints: N/A Range: The flavor_id parameter cannot be specified for a dedicated resource pool of CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows: Default Value: N/A |
node_count |
No |
Integer |
Definition: Number of nodes used to create a training job in a resource pool. Constraints: N/A Range: N/A Default Value: single node |
pool_id |
No |
String |
Definition: Dedicated resource pool ID. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
nfs |
No |
Nfs object |
Definition: NFS mounting volume information of a training job. Constraints: N/A |
pfs |
No |
Pfs object |
Definition: obsfs mounting volume information of a training job. Constraints: N/A |
obs |
No |
Obs object |
Definition: OBS mounting volume information of a training job. Constraints: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
nfs_server_path |
No |
String |
Definition: NFS server path, for example, 10.10.10.10:/example/path. Constraints: N/A Range: N/A Default Value: N/A |
local_path |
No |
String |
Definition: Path for attaching volumes to the training container, for example, /example/path. Constraints: N/A Range: N/A Default Value: N/A |
read_only |
No |
Boolean |
Definition: Specifies whether the disks attached to the container in NFS mode are read-only. Constraints: N/A Range: Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
pfs_path |
No |
String |
Definition: Address of obsfs. For example, /test-bucket/path. Constraints: N/A Range: N/A Default Value: N/A |
local_path |
No |
String |
Definition: Path for attaching volumes to the training container, for example, /example/path. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_path |
No |
String |
Definition: OBS path to be mounted. For example, /test-bucket/path. Constraints: N/A Range: N/A Default Value: N/A |
local_path |
No |
String |
Definition: Path for attaching volumes to the training container, for example, /example/path. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
No |
String |
Definition: OBS path for storing training job logs, for example, obs://example/path. Constraints: N/A Range: N/A Default Value: N/A |
host_path |
No |
String |
Definition: Path of the host where training job logs are stored, for example, /example/path. Constraints: N/A Range: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
time_unit |
Yes |
String |
Definition: Time unit. Constraints: N/A Range: Default Value: N/A |
duration |
Yes |
Integer |
Definition: Runtime. Constraints: N/A Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
required_affinity |
No |
RequiredAffinity object |
Definition: Affinity requirements of a training job. Constraints: N/A |
priority |
No |
Integer |
Definition: Priority of a training job. Constraints:
By default, the job priority can be set to 1 or 2. After the permission to set the highest job priority is configured, the priority can be set to 1 to 3. Range: 0 to 3 Default Value: N/A |
preemptible |
No |
Boolean |
Definition: Whether the resource can be preempted. Constraints: N/A Range: Default Value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
affinity_type |
No |
String |
Definition: Affinity scheduling policy. Constraints: N/A Range: Default Value: N/A |
affinity_group_size |
No |
Integer |
Definition: Size of an affinity group. Constraints: This parameter is mandatory when affinity_type is set to hyperinstance. In this case, the system schedules tasks specified by affinity_group_size to a supernode to form an affinity group. When a user delivers a training job to the supernode resource pool, if the affinity group size is not set, the system sets the value to 1 by default. Range: N/A Default Value: 1 |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
topic_urn |
No |
String |
Definition: URN of the selected topic in SMN. Constraints: N/A Range: N/A Default Value: N/A |
events |
No |
Array of strings |
Definition: Training event that triggers a notification. Constraints: The options are as follows: |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
exec |
No |
Exec object |
Metrics are collected using commands. |
http_get |
No |
HttpGet object |
Metrics are collected using HTTP. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
command |
No |
Array of strings |
Metrics are collected using commands. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
path |
No |
String |
URL for obtaining metrics over HTTP. Both the URL and the port below must either be configured together or remain empty. |
port |
No |
Integer |
Port for obtaining metrics over HTTP. This parameter and the URL above must be set or left blank at the same time. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
ssh |
No |
SSHReq object |
Definition: SSH connection information. Constraints: N/A |
Response Parameters
Status code: 201
Parameter |
Type |
Description |
---|---|---|
kind |
String |
Definition: Type of a training job. Range: |
metadata |
JobMetadataResponse object |
Definition: Training job metadata. |
status |
Status object |
Definition: Training job status information. |
algorithm |
JobAlgorithmResponse object |
Definition: Training job algorithm. |
tasks |
Array of TaskResponse objects |
Definition: Heterogeneous training tasks. |
spec |
SpecResponce object |
Definition: Training job specifications. |
endpoints |
JobEndpointsResp object |
Definition: Configurations required for remotely accessing a training job. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Definition: Training job ID, which is generated and returned by ModelArts after a training job is created. Range: N/A |
name |
String |
Definition: Name of a training job. Range: The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). |
workspace_id |
String |
Definition: Workspace where a specified job is located. Range: N/A |
description |
String |
Definition: Definition of a training job. Range: N/A |
create_time |
Long |
Definition: Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created. Range: N/A |
user_name |
String |
Definition: Username for creating a training job. The username is generated and returned by ModelArts after a training job is created. Range: N/A |
annotations |
Map<String,String> |
Definition: Advanced functions of a training job. |
Parameter |
Type |
Description |
---|---|---|
phase |
String |
Definition: Level-1 status of a training job. Range: |
secondary_phase |
String |
Definition: Level-2 status of a training job. The values are internal detailed statuses and may be added, changed, or deleted. Dependency on the status is not recommended. Range: |
duration |
Long |
Definition: Running duration of a training job, in ms. Range: N/A |
node_count_metrics |
Array<Array<Integer>> |
Definition: Node quantity change metric during a training job runtime. |
tasks |
Array of strings |
Definition: Training job subtask name. |
start_time |
Long |
Definition: Timestamp when a training job is started. Range: N/A |
task_statuses |
Array of TaskStatuses objects |
Definition: Training job subtask status. |
running_records |
Array of RunningRecord objects |
Definition: Running and fault recovery records of a training job. |
Parameter |
Type |
Description |
---|---|---|
task |
String |
Definition: Training job subtask name. Range: N/A |
exit_code |
Integer |
Definition: Exit code of a training job subtask. Range: N/A |
message |
String |
Definition: Error message of a training job subtask. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
start_at |
Integer |
Definition: Unix timestamp of the start time in the current running record, in seconds. Range: N/A |
end_at |
Integer |
Definition: Unix timestamp of the end time in the current running record, in seconds. Range: N/A |
start_type |
String |
Definition: Local running startup mode. Range: |
end_reason |
String |
Definition: Reason why the running ends. Range: N/A |
end_related_task |
String |
Definition: ID of the task worker (for example, worker-0) that ends the running. Range: N/A |
end_recover |
String |
Definition: Fault tolerance policy used after the running ends. Range: |
end_recover_before_downgrade |
String |
Definition: Fault tolerance policy adopted after the running is complete but before the fault tolerance policy is degraded. Range: same as that of end_recover. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Definition: Training job algorithm. Range: |
name |
String |
Definition: Algorithm name. Range: N/A |
subscription_id |
String |
Definition: Subscription ID of a subscription algorithm, which must be used with item_version_id. Range: N/A |
item_version_id |
String |
Definition: Version of a subscription algorithm, which must be used with subscription_id. Range: N/A |
code_dir |
String |
Definition: Code directory of a training job, for example, /usr/app/. This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified. Range: N/A |
boot_file |
String |
Definition: Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified. Range: N/A |
autosearch_config_path |
String |
Definition: YAML configuration path of an auto search job. An OBS URL is required. For example, obs://bucket/file.yaml. Range: N/A |
autosearch_framework_path |
String |
Definition: Framework code directory of an auto search job. An OBS URL is required. For example, obs://bucket/files/. Range: N/A |
command |
String |
Definition: Boot command for starting the container of a custom image for a training job. For example, python train.py. Range: N/A |
parameters |
Array of ParameterResp objects |
Definition: Running parameters of the training job. |
policies |
policies object |
Definition: Policy supported by a job. |
inputs |
Array of InputResp objects |
Definition: Data input of a training job. |
outputs |
Array of OutputResp objects |
Definition: Output of the training job. |
engine |
JobEngineResp object |
Definition: Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm. |
local_code_dir |
String |
Definition: Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows:
Range: N/A |
working_dir |
String |
Definition: Work directory where an algorithm is executed. Rules: In v1 compatibility mode, this parameter does not take effect. Range: N/A |
environments |
Array of Map<String,String> objects |
Definition: Environment variables of a training job. The format is key:value. Leave this parameter blank. |
summary |
SummaryResp object |
Definition: Visualization log summary. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Definition: Parameter name. Range: N/A |
value |
String |
Definition: Parameter value. Range: N/A |
description |
String |
Definition: Parameter description. Range: N/A |
constraint |
constraint object |
Definition: Parameter attribute. |
i18n_description |
i18n_description object |
Definition: Internationalization description. |
Parameter |
Type |
Description |
---|---|---|
type |
String |
Definition: Parameter type. Range: N/A |
editable |
Boolean |
Definition: Whether the parameter can be edited. Range: |
required |
Boolean |
Definition: Whether the parameter is mandatory. Range: |
sensitive |
Boolean |
Definition: Whether the parameter is sensitive. This function is unavailable currently. Range: |
valid_type |
String |
Definition: Valid type. Range: N/A |
valid_range |
Array of strings |
Definition: Valid range. |
Parameter |
Type |
Description |
---|---|---|
language |
String |
Definition: Internationalization language. The options are as follows: Range: N/A |
description |
String |
Definition: Internationalization language description. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
auto_search |
auto_search object |
Definition: Hyperparameter search configuration. |
Parameter |
Type |
Description |
---|---|---|
skip_search_params |
String |
Definition: Hyperparameter parameters that need to be skipped. Range: N/A |
reward_attrs |
Array of reward_attrs objects |
Definition: Search metrics. |
search_params |
Array of search_params objects |
Definition: Search parameters. |
algo_configs |
Array of algo_configs objects |
Definition: Search algorithm configurations. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Definition: Metric name. Range: N/A |
mode |
String |
Definition: Search mode. Range: |
regex |
String |
Definition: Regular expression of a metric. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Definition: Search algorithm name. Range: N/A |
params |
Array of AutoSearchAlgoConfigParameterResp objects |
Definition: Search algorithm parameters. |
Parameter |
Type |
Description |
---|---|---|
key |
String |
Definition: Parameter key. Range: N/A |
value |
String |
Definition: Parameter value. Range: N/A |
type |
String |
Definition: Parameter type. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Definition: Name of the data input channel. Range: N/A |
description |
String |
Definition: Description of the data input channel. Range: N/A |
local_dir |
String |
Definition: Local path of the container to which the data input channels are mapped. Example: /home/ma-user/modelarts/inputs/data_url_0 Range: N/A |
access_method |
String |
Definition: Access method of the input data channel path (local_dir). Range: |
remote |
InputDataInfoResp object |
Definition: Description of the actual data input. |
remote_constraint |
Array of remote_constraint objects |
Definition: Data input constraint. |
Parameter |
Type |
Description |
---|---|---|
dataset |
dataset object |
Definition: The input is a dataset. |
obs |
obs object |
Definition: OBS in which data input and output are stored. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Definition: Dataset ID of a training job. Range: N/A |
version_id |
String |
Definition: Dataset version ID of a training job. Range: N/A |
obs_url |
String |
Definition: OBS URL of the dataset for a training job. It is automatically parsed by ModelArts based on the dataset ID and dataset version ID. For example, /usr/data/. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
Definition: OBS URL of the dataset for a training job, For example, /usr/data/. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
data_type |
String |
Definition: Data input type, including the data storage location and dataset. Constraints: N/A Range: N/A Default Value: N/A |
attributes |
String |
Definition: Related attributes. Constraints: N/A Range: If the input is a dataset: Default Value: N/A |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Definition: Name of the data output channel. Range: N/A |
description |
String |
Definition: Description of the data output channel. Range: N/A |
local_dir |
String |
Definition: Local path of the container to which the data output channels are mapped. Range: N/A |
access_method |
String |
Definition: Access method of the input data channel path (local_dir). Range: |
remote |
RemoteResp object |
Definition: Description of the actual data output. |
Parameter |
Type |
Description |
---|---|---|
engine_id |
String |
Definition: Engine ID selected for a training job. Range: N/A |
engine_name |
String |
Definition: Engine name selected for a training job. Range: N/A |
engine_version |
String |
Definition: Engine version selected for a training job. Range: N/A |
image_url |
String |
Definition: Custom image URL selected for a training job. The URL is obtained from SWR. Range: N/A |
install_sys_packages |
Boolean |
Definition: Specifies whether to install the MoXing version specified by the training platform. Range: |
Parameter |
Type |
Description |
---|---|---|
log_type |
String |
Definition: Visualization log type of a training job. After this parameter is configured, the training job can be used as the data source of a visualization job. Range: |
log_dir |
LogDirResp object |
Definition: Visualization log output of a training job. |
data_sources |
Array of DataSourceResp objects |
Definition: Visualization log input of the visualization job or training job debugging mode. |
Parameter |
Type |
Description |
---|---|---|
pfs |
PFSSummaryResp object |
Definition: Output of an OBS parallel file system. |
Parameter |
Type |
Description |
---|---|---|
pfs_path |
String |
Definition: URL of the OBS parallel file system. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
job |
JobSummaryResp object |
Definition: Job data source. |
Parameter |
Type |
Description |
---|---|---|
job_id |
String |
Definition: ID of a training job. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
role |
String |
Definition: Task role. This function is not supported currently. Range: N/A |
algorithm |
TaskResponseAlgorithm object |
Definition: Algorithm configurations for algorithm management. |
task_resource |
FlavorResponse object |
Definition: Specifications of a training job or algorithm. |
Parameter |
Type |
Description |
---|---|---|
code_dir |
String |
Definition: Absolute path of the directory where the algorithm boot file is stored. Range: N/A |
boot_file |
String |
Definition: Absolute path of an algorithm boot file. Range: N/A |
inputs |
AlgorithmInput object |
Definition: Information about the algorithm input channel. |
outputs |
AlgorithmOutput object |
Definition: Information about the algorithm output channel. |
engine |
AlgorithmEngine object |
Definition: Engine that a heterogeneous job depends on. |
local_code_dir |
String |
Definition: Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows:
Range: N/A |
working_dir |
String |
Definition: Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Definition: Name of the data input channel. Range: N/A |
local_dir |
String |
Definition: Local path of the container to which the data input and output channels are mapped. Range: N/A |
remote |
AlgorithmRemote object |
Definition: Actual data input, which can only be OBS for heterogeneous jobs. |
Parameter |
Type |
Description |
---|---|---|
obs |
RemoteObsResp object |
Definition: OBS in which data input and output are stored. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Definition: Name of the data output channel. Range: N/A |
local_dir |
String |
Definition: Local path of the container to which the data output channels are mapped. Range: N/A |
remote |
RemoteResp object |
Definition: Description of the actual data output. |
mode |
String |
Definition: Data transmission mode. The default value is upload_periodically. Range: N/A |
period |
String |
Definition: Data transmission period. The default value is 30s. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
obs |
RemoteObsResp object |
Definition: Data actually output to OBS. |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
Definition: Path of the data output to OBS. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
engine_id |
String |
Definition: Engine flavor ID, for example, caffe-1.0.0-python2.7. Range: N/A |
engine_name |
String |
Definition: Engine flavor name, for example, Caffe. Range: N/A |
engine_version |
String |
Definition: Engine flavor version. Engines with the same name have multiple versions, for example, Caffe-1.0.0-python2.7 of Python 2.7. Range: N/A |
v1_compatible |
Boolean |
Definition: Specifies whether the v1 compatibility mode is used. Range: |
run_user |
String |
Definition: Default UID for the engine startup. Range: N/A |
image_url |
String |
Definition: Custom image URL selected for an algorithm. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
flavor_id |
String |
Definition: Resource flavor ID. Range: N/A |
flavor_name |
String |
Definition: Resource flavor name. Range: N/A |
max_num |
Integer |
Definition: Maximum number of nodes supported by a flavor. Range: N/A |
flavor_type |
String |
Definition: Resource flavor type. Range: |
billing |
BillingInfo object |
Definition: Billing information of a resource flavor. |
flavor_info |
FlavorInfoResponse object |
Definition: Resource flavor details. |
attributes |
Map<String,String> |
Definition: Other flavor attributes. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
max_num |
Integer |
Definition: Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported. Range: N/A |
cpu |
Cpu object |
Definition: CPU specifications. |
gpu |
Gpu object |
Definition: GPU specifications. |
npu |
Npu object |
Definition: Ascend specifications. |
memory |
Memory object |
Definition: Memory information. |
disk |
DiskResponse object |
Definition: Disk information. |
Parameter |
Type |
Description |
---|---|---|
size |
Integer |
Definition: Disk size. Range: N/A |
unit |
String |
Definition: Unit of the disk size. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
resource |
Resource object |
Definition: Resource flavor of a training job. Select either flavor_id or pool_id and flavor_id. |
volumes |
Array of JobVolumeResp objects |
Definition: Mounting volume information of a training job. |
log_export_path |
LogExportPathResp object |
Definition: Log output of a training job. |
schedule_policy |
SchedulePolicyResp object |
Definition: Scheduling policy of a training job. |
custom_metrics |
Array of CustomMetrics objects |
Metric collection configuration |
Parameter |
Type |
Description |
---|---|---|
policy |
String |
Definition: Resource flavor mode of a training job. Range: |
flavor_id |
String |
Definition: ID of the resource flavor of a training job. Range: The flavor_id parameter cannot be specified for a dedicated resource pool of CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows: |
flavor_name |
String |
Definition: Read-only flavor name returned by ModelArts when flavor_id is used. Range: N/A |
node_count |
Integer |
Definition: Number of resource replicas selected for a training job. Range: N/A |
pool_id |
String |
Definition: ID of the resource pool selected for a training job. Range: N/A |
flavor_detail |
FlavorDetail object |
Definition: Flavor details of a training job or algorithm. This parameter is available only for public resource pools. |
main_container_allocated_resources |
Resource specifications actually obtained by the training container of a training job. |
Parameter |
Type |
Description |
---|---|---|
flavor_type |
String |
Definition: Resource flavor type. Range: |
billing |
BillingInfo object |
Definition: Billing information of a resource flavor. |
flavor_info |
FlavorInfo object |
Definition: Resource flavor details. |
Parameter |
Type |
Description |
---|---|---|
code |
String |
Definition: Billing code. Range: N/A |
unit_num |
Integer |
Definition: Billing unit. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
max_num |
Integer |
Definition: Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported. Range: N/A |
cpu |
Cpu object |
Definition: CPU specifications. |
gpu |
Gpu object |
Definition: GPU specifications. |
npu |
Npu object |
Definition: Ascend specifications. |
memory |
Memory object |
Definition: Memory information. |
disk |
Disk object |
Definition: Disk information. |
Parameter |
Type |
Description |
---|---|---|
arch |
String |
Definition: CPU architecture. Range: N/A |
core_num |
Integer |
Definition: Number of cores. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
unit_num |
Integer |
Definition: Number of GPUs. Range: N/A |
product_name |
String |
Definition: Product name. Range: N/A |
memory |
String |
Definition: Memory. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
unit_num |
String |
Definition: Number of NPUs. Range: N/A |
product_name |
String |
Definition: Product name. Range: N/A |
memory |
String |
Definition: Memory. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
size |
Integer |
Definition: Memory size. Range: N/A |
unit |
String |
Definition: Number of memory units. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
size |
String |
Definition: Disk size. Range: N/A |
unit |
String |
Definition: Unit of the disk size. Generally, the unit is GB. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
cpu_arch |
String |
CPU architecture. |
cpu_core_num |
Float |
Number of cores. |
mem_size |
Float |
Memory information. |
accelerator_num |
Float |
Number of accelerator cards. |
accelerator_type |
String |
Accelerator card type. |
Parameter |
Type |
Description |
---|---|---|
nfs |
NfsResp object |
Definition: Volumes attached in NFS mode. |
Parameter |
Type |
Description |
---|---|---|
nfs_server_path |
String |
Definition: NFS server path, for example, 10.10.10.10:/example/path. Range: N/A |
local_path |
String |
Definition: Path for attaching volumes to the training container, for example, /example/path. Range: N/A |
read_only |
Boolean |
Definition: Specifies whether the disks attached to the container in NFS mode are read-only. Range: |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
Definition: OBS path for storing training job logs, for example, obs://example/path. Range: N/A |
host_path |
String |
Definition: Path of the host where training job logs are stored, for example, /example/path. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
required_affinity |
RequiredAffinityResp object |
Definition: Affinity requirements of a training job. |
priority |
Integer |
Definition: Priority of a training job. Range: 0 to 3 |
preemptible |
Boolean |
Definition: Whether the resource can be preempted. Range: |
Parameter |
Type |
Description |
---|---|---|
affinity_type |
String |
Definition: Affinity scheduling policy. Range: |
affinity_group_size |
Integer |
Definition: Size of an affinity group. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
exec |
Exec object |
Metrics are collected using commands. |
http_get |
HttpGet object |
Metrics are collected using HTTP. |
Parameter |
Type |
Description |
---|---|---|
command |
Array of strings |
Metrics are collected using commands. |
Parameter |
Type |
Description |
---|---|---|
path |
String |
URL for obtaining metrics over HTTP. Both the URL and the port below must either be configured together or remain empty. |
port |
Integer |
Port for obtaining metrics over HTTP. This parameter and the URL above must be set or left blank at the same time. |
Parameter |
Type |
Description |
---|---|---|
ssh |
SSHResp object |
Definition: SSH connection information. |
jupyter_lab |
JupyterLab object |
Definition: JupyterLab connection information. |
tensorboard |
Tensorboard object |
Definition: TensorBoard connection information. |
mindstudio_insight |
MindStudioInsight object |
Definition: MindStudio Insight connection information. |
Parameter |
Type |
Description |
---|---|---|
key_pair_names |
Array of strings |
Definition: Name of the SSH key pair, which can be created and viewed on the Key Pair page of the Elastic Cloud Server (ECS) console. Range: N/A |
task_urls |
Array of TaskUrls objects |
Definition: SSH connection address. |
Parameter |
Type |
Description |
---|---|---|
task |
String |
Definition: Task ID of a training job. Range: N/A |
url |
String |
Definition: SSH connection address of a training job. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
url |
String |
Definition: JupyterLab address of a training job. Range: N/A |
token |
String |
Definition: JupyterLab token of a training job. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
url |
String |
Definition: TensorBoard address of a training job. Range: N/A |
token |
String |
Definition: TensorBoard token of a training job. Range: N/A |
Parameter |
Type |
Description |
---|---|---|
url |
String |
Definition: MindStudio Insight address of a training job. Range: N/A |
token |
String |
Definition: MindStudio Insight token of a training job. Range: N/A |
Status code: 400
Parameter |
Type |
Description |
---|---|---|
error_msg |
String |
Error message |
error_code |
String |
Error code |
error_solution |
String |
Solution |
Example Requests
-
The following is an example of how to create a training job with free specifications. The job name has been set to TestModelArtsJob and the description has been set to This is a ModelArts job. The required algorithm's ID is 3f5d6706-7b67-408d-8ba0-ec08048c45ed. The inputs and outputs have not been defined for the algorithm.
POST https://endpoint/v2/{project_id}/training-jobs { "kind" : "job", "metadata" : { "id" : "425b7087-83de-49ed-9e40-5bb642be956f", "name" : "TestModelArtsJob", "description" : "This is a ModelArts job", "create_time" : 1637045545982, "workspace_id" : "0", "user_name" : "" }, "algorithm" : { "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed", "name" : "ttt-obs-gpu", "code_dir" : "/cn-north-4-rse/test/moxingtest-code/", "boot_file" : "/cn-north-4-rse/test/moxingtest-code/test_obs_gpu.py", "parameters" : [ { "name" : "input_dir", "description" : "", "i18n_description" : null, "value" : "s://cn-north-4-rse/test/moxingtest-dir/", "constraint" : { "type" : "String", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } }, { "name" : "input_file", "description" : "", "i18n_description" : null, "value" : "obs://cn-north-4-rse/test/moxingtest/", "constraint" : { "type" : "String", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } }, { "name" : "large_file_method", "description" : "", "i18n_description" : null, "value" : "1", "constraint" : { "type" : "Integer", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } } ], "engine" : { "engine_id" : "horovod-cp36-tf-1.16.2", "engine_name" : "Horovod", "engine_version" : "0.16.2-TF-1.13.1-python3.6" }, "policies" : { } }, "spec" : { "resource" : { "flavor_id" : "modelarts.p3.large.public.free", "node_count" : 1 }, "log_export_path" : { }, "custom_metrics" : [ { "http_get" : { "path" : "/raw_text", "port" : 10001 } } ] } }
-
The following is an example of how to use a custom image to create a training job whose name is TestModelArtsJob2 and description is This is a ModelArts job2. A dedicated resource pool and NFS mounting are used.
POST https://endpoint/v2/{project_id}/training-jobs { "kind" : "job", "metadata" : { "name" : "TestModelArtsJob2", "description" : "This is a ModelArts job2" }, "algorithm" : { "engine" : { "image_url" : "xxxxxxxx/fastseq:1.2" }, "command" : "cd /home/ma-user/ddp_demo && sh run_ddp.sh", "parameters" : [ ], "policies" : { "auto_search" : null }, "environments" : { "NCCL_DEBUG" : "INFO", "NCCL_IB_DISABLE" : "0" } }, "spec" : { "resource" : { "flavor_id" : "modelarts.pool.visual.xlarge", "node_count" : 1, "pool_id" : "poolfaf38d76" }, "log_export_path" : { "obs_url" : "/cn-north-4-training-test/limou/ddp-demo-log/" }, "volumes" : [ { "nfs" : { "nfs_server_path" : "192.168.0.82:/", "local_path" : "/home/ma-user/nfs/", "read_only" : false } } ] } }
Example Responses
Status code: 201
ok
{ "kind" : "job", "metadata" : { "id" : "425b7087-83de-49ed-9e40-5bb642be956f", "name" : "TestModelArtsJob", "description" : "This is a ModelArts job", "create_time" : 1637045545982, "workspace_id" : "0", "user_name" : "" }, "status" : { "phase" : "Creating", "secondary_phase" : "Creating", "duration" : 0, "start_time" : 0, "node_count_metrics" : null, "tasks" : [ "worker-0", "server-0" ] }, "algorithm" : { "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed", "name" : "ttt-obs-gpu", "code_dir" : "/cn-north-4-rse/test/moxingtest-code/", "boot_file" : "/cn-north-4-rse/test/moxingtest-code/test_obs_gpu.py", "parameters" : [ { "name" : "input_dir", "description" : "", "i18n_description" : null, "value" : "s://cn-north-4-rse/test/moxingtest-dir/", "constraint" : { "type" : "String", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } }, { "name" : "input_file", "description" : "", "i18n_description" : null, "value" : "obs://cn-north-4-rse/test/moxingtest/", "constraint" : { "type" : "String", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } }, { "name" : "large_file_method", "description" : "", "i18n_description" : null, "value" : "1", "constraint" : { "type" : "Integer", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } } ], "engine" : { "engine_id" : "horovod-cp36-tf-1.16.2", "engine_name" : "Horovod", "engine_version" : "0.16.2-TF-1.13.1-python3.6" }, "policies" : { } }, "spec" : { "resource" : { "policy" : "regular", "flavor_id" : "modelarts.p3.large.public.free", "flavor_name" : "Computing GPU(Vnt1) instance", "node_count" : 1, "flavor_detail" : { "flavor_type" : "GPU", "billing" : { "code" : "modelarts.vm.gpu.free", "unit_num" : 1 }, "flavor_info" : { "cpu" : { "arch" : "x86", "core_num" : 8 }, "gpu" : { "unit_num" : 1, "product_name" : "GP-Vnt1", "memory" : "32GB" }, "memory" : { "size" : 64, "unit" : "GB" } } }, "main_container_allocated_resources" : { "cpu_arch" : "x86", "cpu_core_num" : 5, "mem_size" : 44, "accelerator_num" : 1, "accelerator_type" : "nvidia-v100-pcie32" } }, "log_export_path" : { }, "custom_metrics" : [ { "exec" : { "command" : [ "cat", "/a/b/c.porm" ] } }, { "http_get" : { "path" : "/raw_text", "port" : 10001 } } ] } }
Status code: 400
Format of the body for a common error response. The following shows the returned information when an algorithm with ID 3f5d6706-7b67-408d-8ba0-ec08048c45ee is not found.
{ "error_msg" : "algorithm not found.", "error_code" : "ModelArts.2755", "error_solution" : "Check whether the training project information in the request is valid." }
Status Codes
Status Code |
Description |
---|---|
201 |
ok |
400 |
Format of the body for a common error response. The following shows the returned information when an algorithm with ID 3f5d6706-7b67-408d-8ba0-ec08048c45ee is not found. |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot