Creating a Training Job
Function
This API is used to create a training job.
URI
POST /v2/{project_id}/training-jobs
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| project_id | Yes | String | Project ID. For details, see Obtaining a Project ID and Name. |
Request Parameters
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| kind | Yes | String | Training job type, which is job by default. Options: |
| metadata | Yes | JobMetadata object | Metadata of a training job. |
| algorithm | No | JobAlgorithm object | Algorithm used by a training job. Options: |
| tasks | No | Array of Task objects | Task list. This function is not implemented currently. |
| spec | No | spec object | Specifications of a training job. If this parameter is specified, leave the tasks parameter blank. |
| endpoints | No | JobEndpointsReq object | Configuration required for remotely accessing a training job. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | Yes | String | Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). |
| workspace_id | No | String | Workspace where a job is located. The default value is 0. |
| description | No | String | Training job description. The value must contain 0 to 256 characters. The default value is NULL. |
| annotations | No | Map<String,String> | Advanced configuration of a training job. Options: |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| id | No | String | Algorithm ID. |
| name | No | String | Algorithm name. Leave it blank. |
| subscription_id | No | String | Subscription ID of a subscribed algorithm, which must be used with item_version_id |
| item_version_id | No | String | Version ID of the subscribed algorithm, which must be used with subscription_id |
| code_dir | No | String | Code directory of a training job, for example, /usr/app/. This parameter must be used together with boot_file. If id or subscription_id+item_version_id is set, leave it blank. |
| boot_file | No | String | Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified. |
| autosearch_config_path | No | String | YAML configuration path of auto search jobs. An OBS URL is required. |
| autosearch_framework_path | No | String | Framework code directory of auto search jobs. An OBS URL is required. |
| command | No | String | Command for starting the container of the custom image of a training job in the custom image scenario. |
| parameters | No | Array of parameters objects | Running parameters of a training job. |
| policies | No | policies object | Policies supported by jobs, which are used for hyperparameter search. |
| inputs | No | Array of Input objects | Input of a training job. |
| outputs | No | Array of Output objects | Output of a training job. |
| engine | No | engine object | Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm. |
| local_code_dir | No | String | Local directory to the training container to which the algorithm code directory is downloaded Rules: |
| working_dir | No | String | Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode. |
| environments | No | Array of Map<String,String> objects | Environment variables of a training job. The format is key: value. Leave this parameter blank. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | No | String | Parameter name. |
| value | No | String | Parameter value. |
| description | No | String | Parameter description. |
| constraint | No | constraint object | Parameter constraint. |
| i18n_description | No | i18n_description object | Internationalization description. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| type | No | String | Parameter type. |
| editable | No | Boolean | Whether the parameter is editable. |
| required | No | Boolean | Whether the parameter is mandatory. |
| sensitive | No | Boolean | Whether the parameter is sensitive This function is not implemented currently. |
| valid_type | No | String | Valid type. |
| valid_range | No | Array of strings | Valid range. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| language | No | String | Internationalization language. |
| description | No | String | Description. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| auto_search | No | auto_search object | Hyperparameter search configuration. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| skip_search_params | No | String | Hyperparameter parameters that need to be skipped. |
| reward_attrs | No | Array of reward_attrs objects | List of search metrics. |
| search_params | No | Array of search_params objects | Search parameters. |
| algo_configs | No | Array of algo_configs objects | Search algorithm configurations. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | No | String | Metric name. |
| mode | No | String | Search direction. |
| regex | No | String | Regular expression of a metric. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | No | String | Name of the search algorithm. |
| params | No | Array of AutoSearchAlgoConfigParameter objects | Search algorithm parameters. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| key | No | String | Parameter key. |
| value | No | String | Parameter value. |
| type | No | String | Parameter type. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| engine_id | No | String | Engine ID selected for a training job. You can set this parameter to engine_id, engine_name + engine_version, or image_url. |
| engine_name | No | String | Name of the engine selected for a training job. If engine_id is set, leave this parameter blank. |
| engine_version | No | String | Name of the engine version selected for a training job. If engine_id is set, leave this parameter blank. |
| image_url | No | String | Custom image URL selected for a training job. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| role | No | String | Task role. This function is not supported currently. |
| algorithm | No | algorithm object | Algorithm management and configuration. |
| task_resource | No | task_resource object | Resource flavors of a training job. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| job_config | No | job_config object | Algorithm configuration, such as the boot file. |
| code_dir | No | String | Algorithm code directory, for example, /usr/app/. This parameter must be used together with boot_file. |
| boot_file | No | String | Code boot file of the algorithm, which needs to be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used together with code_dir. |
| engine | No | engine object | Engine of a heterogeneous job algorithm. |
| inputs | No | Array of inputs objects | Data input of an algorithm. |
| outputs | No | Array of outputs objects | Data output of an algorithm. |
| local_code_dir | No | String | Local directory to the training container to which the algorithm code directory is downloaded. Ensure that the following rules are complied with: |
| working_dir | No | String | Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| parameters | No | Array of Parameter objects | Running parameter of an algorithm. |
| inputs | No | Array of Input objects | Data input of an algorithm. |
| outputs | No | Array of Output objects | Data output of an algorithm. |
| engine | No | engine object | Algorithm engine. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | No | String | Parameter name. |
| value | No | String | Parameter value. |
| description | No | String | Parameter description. |
| constraint | No | constraint object | Parameter constraint. |
| i18n_description | No | i18n_description object | Internationalization description. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| type | No | String | Parameter type. |
| editable | No | Boolean | Whether the parameter is editable. |
| required | No | Boolean | Whether the parameter is mandatory. |
| sensitive | No | Boolean | Whether the parameter is sensitive This function is not implemented currently. |
| valid_type | No | String | Valid type. |
| valid_range | No | Array of strings | Valid range. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| language | No | String | Language. Options: |
| description | No | String | Description. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | Yes | String | Name of the data input channel. |
| description | No | String | Description of the data input channel. |
| local_dir | No | String | Local directory of the container to which the data input channel is mapped. |
| remote | Yes | InputDataInfo object | Data input. Options: |
| remote_constraint | No | Array of remote_constraint objects | Data input constraint |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| dataset | No | dataset object | Dataset as the data input. |
| obs | No | obs object | OBS in which data input and output stored. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| id | Yes | String | Dataset ID of a training job. |
| version_id | Yes | String | Dataset version ID of a training job. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| obs_url | Yes | String | OBS URL of the dataset required by a training job. For example, /usr/data/. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| data_type | No | String | Data input type, including the data storage location and dataset. |
| attributes | No | String | Attributes if a dataset is used as the data input. Options: |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | Yes | String | Name of the data output channel. |
| description | No | String | Description of the data output channel. |
| local_dir | No | String | Local directory of the container to which the data output channel is mapped. |
| remote | Yes | remote object | Description of the actual data output. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| obs | Yes | obs object | OBS to which data is actually exported. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| obs_url | Yes | String | OBS URL to which data is actually exported. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| engine_id | No | String | Engine ID selected for an algorithm. |
| engine_name | No | String | Engine version name selected for an algorithm. If engine_id is specified, leave this parameter blank. |
| engine_version | No | String | Engine version name selected for an algorithm. If engine_id is specified, leave this parameter blank. |
| image_url | No | String | Custom image URL selected by an algorithm. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| engine_id | No | String | Engine ID of a heterogeneous job, for example, caffe-1.0.0-python2.7. |
| engine_name | No | String | Engine name of a heterogeneous job, for example, Caffe. |
| engine_version | No | String | Engine version of a heterogeneous job. |
| image_url | No | String | Custom image URL selected by an algorithm. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | Yes | String | Name of the data input channel. |
| description | No | String | Description of the data input channel. |
| local_dir | No | String | Local directory of the container to which the data input channel is mapped. |
| remote | Yes | remote object | Data input. Options: |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| obs | No | obs object | OBS in which data input and output stored. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| obs_url | Yes | String | OBS URL of the dataset required by a training job. For example, /usr/data/. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| name | Yes | String | Name of the data output channel. |
| description | No | String | Description of the data output channel. |
| local_dir | No | String | Local directory of the container to which the data output channel is mapped. |
| remote | Yes | remote object | Description of the actual data output. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| obs | Yes | obs object | OBS to which data is actually exported. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| obs_url | Yes | String | OBS URL to which data is actually exported. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| flavor_id | No | String | Resource flavor ID of a training job. |
| node_count | Yes | Integer | Number of resource replicas selected for a training job. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| resource | No | resource object | Resource flavors of a training job. Select either flavor_id or pool_id+[flavor_id]. |
| volumes | No | Array of volumes objects | Volumes attached to a training job. |
| log_export_path | No | log_export_path object | Export path of training job logs. |
| auto_stop | No | auto_stop object | Auto stop configuration of a training job |
| schedule_policy | No | schedule_policy object | Training Job Scheduling Policy |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| flavor_id | No | String | ID of the resource flavor selected for a training job. flavor_id cannot be specified for dedicated resource pools with CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows: |
| node_count | No | Integer | Number of nodes used for creating a training job in a pool. By default, a single node is used. |
| pool_id | No | String | Dedicated resource pool ID. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| nfs | No | nfs object | Volumes attached in NFS mode. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| nfs_server_path | No | String | NFS server path. |
| local_path | No | String | Path for attaching volumes to the training container. |
| read_only | No | Boolean | Whether the volumes attached to the container in NFS mode are read-only. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| obs_url | No | String | OBS URL for storing training job logs. |
| host_path | No | String | Path of the host where training job logs are stored. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| time_unit | Yes | String | Time unit. Options: |
| duration | Yes | Integer | Running time. The minimum value is 1. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| required_affinity | No | required_affinity object | Affinity requirements for training operations |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| affinity_type | No | String | Affinity scheduling policy. The options are as follows: |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| ssh | No | SSHReq object | SSHConnection information. |
Response Parameters
Status code: 201
| Parameter | Type | Description |
|---|---|---|
| kind | String | Training job type, which is job by default. Options: |
| metadata | JobMetadata object | Metadata of a training job. |
| status | Status object | Status of a training job. You do not need to set this parameter when creating a job. |
| algorithm | JobAlgorithmResponse object | Algorithm used by a training job. Options: |
| tasks | Array of TaskResponse objects | List of tasks in heterogeneous training jobs. |
| spec | spec object | Specifications of a training job. |
| endpoints | JobEndpointsResp object | Configuration required for remotely accessing a training job. |
| Parameter | Type | Description |
|---|---|---|
| id | String | Training job ID, which is generated and returned by ModelArts after the training job is created. |
| name | String | Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). |
| workspace_id | String | Workspace where a job is located. The default value is 0. |
| description | String | Training job description. The value must contain 0 to 256 characters. The default value is NULL. |
| create_time | Long | Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created. |
| user_name | String | Username for creating a training job. The username is generated and returned by ModelArts after a training job is created. |
| annotations | Map<String,String> | Advanced configuration of a training job. Options: |
| Parameter | Type | Description |
|---|---|---|
| phase | String | Level-1 status of a training job. The options are as follows: Creating Pending Running Failed Completed, Terminating Terminated Abnormal |
| secondary_phase | String | The level-2 status of a training job is an internal detailed status, which may be added, modified, or deleted. Dependency is not recommended. The options are as follows: Creating Queuing Running Failed Completed, Terminating Terminated CreateFailed TerminatedFailed Unknown Lost |
| duration | Long | Running duration of a training job, in milliseconds |
| node_count_metrics | Array<Array<Integer>> | Node count changes during the training job running period. |
| tasks | Array of strings | Tasks of a training job. |
| start_time | Long | Start time of a training job. The value is in timestamp format. |
| task_statuses | Array of task_statuses objects | Status of a training job task. |
| running_records | Array of running_records objects | Running and fault recovery records of a training job |
| Parameter | Type | Description |
|---|---|---|
| task | String | Name of a training job task. |
| exit_code | Integer | Exit code of a training job task. |
| message | String | Error message of a training job task. |
| Parameter | Type | Description |
|---|---|---|
| start_at | Integer | Unix timestamp of the start time in the current running record, in seconds |
| end_at | Integer | Unix timestamp of the end time in the current running record, in seconds |
| start_type | String | Startup mode of the current running record. The options are as follows: init_or_rescheduled: This startup is the first running after scheduling, including the first startup and the running after scheduling recovery. restarted: This startup is not the first running after scheduling but the running after a process restart. |
| end_reason | String | Reason why the current running record ends |
| end_related_task | String | ID of the task worker that causes the end of the current running record, for example, worker-0 |
| end_recover | String | Fault tolerance policy used after the current running record ends. The options are as follows: npu_proc_restart: NPU in-place hot recovery gpu_proc_restart: GPU in-place hot recovery proc_restart: Process in-place recovery pod_reschedule: Pod-level rescheduling job_reschedule: Job-level rescheduling job_reschedule_with_taint: Isolated job-level rescheduling |
| end_recover_before_downgrade | String | Tolerance policy used after the current running record ends and before the fault tolerance policy is degraded. The options are the same as those of end_recover. |
| Parameter | Type | Description |
|---|---|---|
| id | String | Algorithm used by a training job. Options: |
| name | String | Algorithm name. |
| subscription_id | String | Subscription ID of a subscribed algorithm, which must be used with item_version_id |
| item_version_id | String | Version ID of the subscribed algorithm, which must be used with subscription_id |
| code_dir | String | Code directory of a training job, for example, /usr/app/. This parameter must be used together with boot_file. If id or subscription_id+item_version_id is set, leave it blank. |
| boot_file | String | Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified. |
| autosearch_config_path | String | YAML configuration path of auto search jobs. An OBS URL is required. |
| autosearch_framework_path | String | Framework code directory of auto search jobs. An OBS URL is required. |
| command | String | Boot command used to start the container of a custom image of a training job. For example, python train.py. |
| parameters | Array of Parameter objects | Running parameters of a training job. |
| policies | policies object | Policies supported by jobs. |
| inputs | Array of Input objects | Input of a training job. |
| outputs | Array of Output objects | Output of a training job. |
| engine | engine object | Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm. |
| local_code_dir | String | Local directory to the training container to which the algorithm code directory is downloaded. Ensure that the following rules are complied with: |
| working_dir | String | Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode. |
| environments | Array of Map<String,String> objects | Environment variables of a training job. The format is key: value. Leave this parameter blank. |
| Parameter | Type | Description |
|---|---|---|
| name | String | Parameter name. |
| value | String | Parameter value. |
| description | String | Parameter description. |
| constraint | constraint object | Parameter constraint. |
| i18n_description | i18n_description object | Internationalization description. |
| Parameter | Type | Description |
|---|---|---|
| type | String | Parameter type. |
| editable | Boolean | Whether the parameter is editable. |
| required | Boolean | Whether the parameter is mandatory. |
| sensitive | Boolean | Whether the parameter is sensitive This function is not implemented currently. |
| valid_type | String | Valid type. |
| valid_range | Array of strings | Valid range. |
| Parameter | Type | Description |
|---|---|---|
| language | String | Language. Options: |
| description | String | Description. |
| Parameter | Type | Description |
|---|---|---|
| auto_search | auto_search object | Hyperparameter search configuration. |
| Parameter | Type | Description |
|---|---|---|
| skip_search_params | String | Hyperparameter parameters that need to be skipped. |
| reward_attrs | Array of reward_attrs objects | List of search metrics. |
| search_params | Array of search_params objects | Search parameters. |
| algo_configs | Array of algo_configs objects | Search algorithm configurations. |
| Parameter | Type | Description |
|---|---|---|
| name | String | Metric name. |
| mode | String | Search direction. |
| regex | String | Regular expression of a metric. |
| Parameter | Type | Description |
|---|---|---|
| name | String | Name of the search algorithm. |
| params | Array of AutoSearchAlgoConfigParameter objects | Search algorithm parameters. |
| Parameter | Type | Description |
|---|---|---|
| key | String | Parameter key. |
| value | String | Parameter value. |
| type | String | Parameter type. |
| Parameter | Type | Description |
|---|---|---|
| name | String | Name of the data input channel. |
| description | String | Description of the data input channel. |
| local_dir | String | Local directory of the container to which the data input channel is mapped. |
| remote | InputDataInfo object | Data input. Options: |
| remote_constraint | Array of remote_constraint objects | Data input constraint |
| Parameter | Type | Description |
|---|---|---|
| dataset | dataset object | Dataset as the data input. |
| obs | obs object | OBS in which data input and output stored. |
| Parameter | Type | Description |
|---|---|---|
| id | String | Dataset ID of a training job. |
| version_id | String | Dataset version ID of a training job. |
| obs_url | String | OBS URL of the dataset required by a training job. ModelArts automatically parses and generates the URL based on the dataset and dataset version IDs. For example, /usr/data/. |
| Parameter | Type | Description |
|---|---|---|
| obs_url | String | OBS URL of the dataset required by a training job. For example, /usr/data/. |
| Parameter | Type | Description |
|---|---|---|
| data_type | String | Data input type, including the data storage location and dataset. |
| attributes | String | Attributes if a dataset is used as the data input. Options: |
| Parameter | Type | Description |
|---|---|---|
| name | String | Name of the data output channel. |
| description | String | Description of the data output channel. |
| local_dir | String | Local directory of the container to which the data output channel is mapped. |
| remote | remote object | Description of the actual data output. |
| Parameter | Type | Description |
|---|---|---|
| obs_url | String | OBS URL to which data is actually exported. |
| Parameter | Type | Description |
|---|---|---|
| engine_id | String | Engine ID selected for a training job. You can set this parameter to engine_id, engine_name + engine_version, or image_url. |
| engine_name | String | Name of the engine selected for a training job. If engine_id is set, leave this parameter blank. |
| engine_version | String | Name of the engine version selected for a training job. If engine_id is set, leave this parameter blank. |
| image_url | String | Custom image URL selected for a training job. |
| Parameter | Type | Description |
|---|---|---|
| role | String | Task role. This function is not supported currently. |
| algorithm | algorithm object | Algorithm management and configuration. |
| task_resource | FlavorResponse object | Flavors of a training job or an algorithm. |
| Parameter | Type | Description |
|---|---|---|
| code_dir | String | Absolute path of the directory where the algorithm boot file is stored. |
| boot_file | String | Absolute path of the algorithm boot file. |
| inputs | inputs object | Algorithm input channel. |
| outputs | outputs object | Algorithm output channel. |
| engine | engine object | Engine on which a heterogeneous job depends. |
| local_code_dir | String | Local directory to the training container to which the algorithm code directory is downloaded. Ensure that the following rules are complied with: |
| working_dir | String | Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode. |
| Parameter | Type | Description |
|---|---|---|
| name | String | Name of the data input channel. |
| local_dir | String | Local path of the container to which the data input and output channels are mapped. |
| remote | remote object | Actual data input. Heterogeneous jobs support only OBS. |
| Parameter | Type | Description |
|---|---|---|
| obs | obs object | OBS in which data input and output stored. |
| Parameter | Type | Description |
|---|---|---|
| obs_url | String | OBS URL of the dataset required by a training job. For example, /usr/data/. |
| Parameter | Type | Description |
|---|---|---|
| name | String | Name of the data output channel. |
| local_dir | String | Local directory of the container to which the data output channel is mapped. |
| remote | remote object | Description of the actual data output. |
| mode | String | Data transmission mode. The default value is upload_periodically. |
| period | String | Data transmission period. The default value is 30s. |
| Parameter | Type | Description |
|---|---|---|
| obs | obs object | OBS to which data is actually exported. |
| Parameter | Type | Description |
|---|---|---|
| obs_url | String | OBS URL to which data is actually exported. |
| Parameter | Type | Description |
|---|---|---|
| engine_id | String | Engine ID of a heterogeneous job, for example, caffe-1.0.0-python2.7. |
| engine_name | String | Engine name of a heterogeneous job, for example, Caffe. |
| engine_version | String | Engine version of a heterogeneous job. |
| v1_compatible | Boolean | Whether the v1 compatibility mode is used. |
| run_user | String | User UID started by default by the engine. |
| image_url | String | Custom image URL selected by an algorithm. |
| Parameter | Type | Description |
|---|---|---|
| flavor_id | String | ID of the resource flavor. |
| flavor_name | String | Name of the resource flavor. |
| max_num | Integer | Maximum number of nodes in a resource flavor. |
| flavor_type | String | Resource flavor type. Options: |
| billing | billing object | Billing information of a resource flavor. |
| flavor_info | flavor_info object | Resource flavor details. |
| attributes | Map<String,String> | Other specification attributes. |
| Parameter | Type | Description |
|---|---|---|
| code | String | Billing code. |
| unit_num | Integer | Number of billing units. |
| Parameter | Type | Description |
|---|---|---|
| max_num | Integer | Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported. |
| cpu | cpu object | CPU specifications. |
| gpu | gpu object | GPU specifications. |
| npu | npu object | Ascend specifications |
| memory | memory object | Memory information. |
| disk | disk object | Disk information. |
| Parameter | Type | Description |
|---|---|---|
| arch | String | CPU architecture. |
| core_num | Integer | Number of cores. |
| Parameter | Type | Description |
|---|---|---|
| unit_num | Integer | Number of GPUs. |
| product_name | String | Product name. |
| memory | String | Memory. |
| Parameter | Type | Description |
|---|---|---|
| unit_num | String | Number of NPUs. |
| product_name | String | Product name. |
| memory | String | Memory. |
| Parameter | Type | Description |
|---|---|---|
| resource | Resource object | Resource flavors of a training job. Select either flavor_id or pool_id+[flavor_id]. |
| volumes | Array of volumes objects | Volumes attached to a training job. |
| log_export_path | log_export_path object | Export path of training job logs. |
| Parameter | Type | Description |
|---|---|---|
| policy | String | Resource flavor of a training job. Options: regular |
| flavor_id | String | ID of the resource flavor selected for a training job. flavor_id cannot be specified for dedicated resource pools with CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows: |
| flavor_name | String | Read-only flavor name returned by ModelArts when flavor_id is used. |
| node_count | Integer | Number of resource replicas selected for a training job. |
| pool_id | String | Resource pool ID selected for a training job. |
| flavor_detail | flavor_detail object | Flavors of a training job or an algorithm. |
| Parameter | Type | Description |
|---|---|---|
| flavor_type | String | Resource flavor type. Options: |
| billing | billing object | Billing information of a resource flavor. |
| flavor_info | flavor_info object | Resource flavor details. |
| Parameter | Type | Description |
|---|---|---|
| code | String | Billing code. |
| unit_num | Integer | Number of billing units. |
| Parameter | Type | Description |
|---|---|---|
| max_num | Integer | Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported. |
| cpu | cpu object | CPU specifications. |
| gpu | gpu object | GPU specifications. |
| npu | npu object | Ascend specifications |
| memory | memory object | Memory information. |
| disk | disk object | Disk information. |
| Parameter | Type | Description |
|---|---|---|
| arch | String | CPU architecture. |
| core_num | Integer | Number of cores. |
| Parameter | Type | Description |
|---|---|---|
| unit_num | Integer | Number of GPUs. |
| product_name | String | Product name. |
| memory | String | Memory. |
| Parameter | Type | Description |
|---|---|---|
| unit_num | String | Number of NPUs. |
| product_name | String | Product name. |
| memory | String | Memory. |
| Parameter | Type | Description |
|---|---|---|
| size | Integer | Memory size. |
| unit | String | Number of memory units. |
| Parameter | Type | Description |
|---|---|---|
| size | String | Disk size. |
| unit | String | Unit of the disk size. Generally, the value is GB. |
| Parameter | Type | Description |
|---|---|---|
| nfs_server_path | String | NFS server path. |
| local_path | String | Path for attaching volumes to the training container. |
| read_only | Boolean | Whether the volumes attached to the container in NFS mode are read-only. |
| Parameter | Type | Description |
|---|---|---|
| obs_url | String | OBS URL for storing training job logs. |
| host_path | String | Path of the host where training job logs are stored. |
| Parameter | Type | Description |
|---|---|---|
| ssh | SSHResp object | SSHConnection information. |
| jupyter_lab | JupyterLab object | JupyterLabConnection information. |
| Parameter | Type | Description |
|---|---|---|
| key_pair_names | Array of strings | SSH key pair name, which can be created and viewed on the Key Pair page of the ECS console. |
| task_urls | Array of TaskUrls objects | SSH connection address information. |
| Parameter | Type | Description |
|---|---|---|
| task | String | Task ID of a training job. |
| url | String | SSH connection address of a training job. |
| Parameter | Type | Description |
|---|---|---|
| url | String | JupyterLab address of a training job. |
| token | String | JupyterLab token of the training job. |
Status code: 400
| Parameter | Type | Description |
|---|---|---|
| error_msg | String | Error message |
| error_code | String | Error code |
| error_solution | String | Solution |
Example Requests
-
The following is an example of how to create a training job with free specifications. The job name has been set to TestModelArtsJob and the description has been set to This is a ModelArts job. The required algorithm's ID is 3f5d6706-7b67-408d-8ba0-ec08048c45ed. The inputs and outputs have not been defined for the algorithm.
POST https://endpoint/v2/{project_id}/training-jobs { "kind" : "job", "metadata" : { "name" : "TestModelArtsJob", "description" : "This is a ModelArts job" }, "algorithm" : { "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed", "parameters" : [ { "name" : "input_dir", "value" : "obs://cn-north-4-rse/test/moxingtest-dir/" }, { "name" : "input_file", "value" : "obs://cn-north-4-rse/test/moxingtest/" }, { "name" : "large_file_method", "value" : "1" } ], "policies" : { "auto_search" : null }, "environments" : { } }, "spec" : { "resource" : { "flavor_id" : "modelarts.p3.large.public.free", "node_count" : 1 }, "log_export_path" : { "obs_url" : "" } } } -
The following is an example of how to use a custom image to create a training job whose name is TestModelArtsJob2 and description is This is a ModelArts job2. A dedicated resource pool and NFS mounting are used.
POST https://endpoint/v2/{project_id}/training-jobs { "kind" : "job", "metadata" : { "name" : "TestModelArtsJob2", "description" : "This is a ModelArts job2" }, "algorithm" : { "engine" : { "image_url" : "xxxxxxxx/fastseq:1.2" }, "command" : "cd /home/ma-user/ddp_demo && sh run_ddp.sh", "parameters" : [ ], "policies" : { "auto_search" : null }, "environments" : { "NCCL_DEBUG" : "INFO", "NCCL_IB_DISABLE" : "0" } }, "spec" : { "resource" : { "flavor_id" : "modelarts.pool.visual.xlarge", "node_count" : 1, "pool_id" : "poolfaf38d76" }, "log_export_path" : { "obs_url" : "/cn-north-4-training-test/limou/ddp-demo-log/" }, "volumes" : [ { "nfs" : { "nfs_server_path" : "192.168.0.82:/", "local_path" : "/home/ma-user/nfs/", "read_only" : false } } ] } }
Example Responses
Status code: 201
ok
{
"kind" : "job",
"metadata" : {
"id" : "425b7087-83de-49ed-9e40-5bb642be956f",
"name" : "TestModelArtsJob",
"description" : "This is a ModelArts job",
"create_time" : 1637045545982,
"workspace_id" : "0",
"user_name" : ""
},
"status" : {
"phase" : "Creating",
"secondary_phase" : "Creating",
"duration" : 0,
"start_time" : 0,
"node_count_metrics" : null,
"tasks" : [ "worker-0", "server-0" ]
},
"algorithm" : {
"id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed",
"name" : "ttt-obs-gpu",
"code_dir" : "/cn-north-4-rse/test/moxingtest-code/",
"boot_file" : "/cn-north-4-rse/test/moxingtest-code/test_obs_gpu.py",
"parameters" : [ {
"name" : "input_dir",
"description" : "",
"i18n_description" : null,
"value" : "s://cn-north-4-rse/test/moxingtest-dir/",
"constraint" : {
"type" : "String",
"editable" : true,
"required" : true,
"sensitive" : false,
"valid_type" : "None",
"valid_range" : [ ]
}
}, {
"name" : "input_file",
"description" : "",
"i18n_description" : null,
"value" : "obs://cn-north-4-rse/test/moxingtest/",
"constraint" : {
"type" : "String",
"editable" : true,
"required" : true,
"sensitive" : false,
"valid_type" : "None",
"valid_range" : [ ]
}
}, {
"name" : "large_file_method",
"description" : "",
"i18n_description" : null,
"value" : "1",
"constraint" : {
"type" : "Integer",
"editable" : true,
"required" : true,
"sensitive" : false,
"valid_type" : "None",
"valid_range" : [ ]
}
} ],
"engine" : {
"engine_id" : "horovod-cp36-tf-1.16.2",
"engine_name" : "Horovod",
"engine_version" : "0.16.2-TF-1.13.1-python3.6"
},
"policies" : { }
},
"spec" : {
"resource" : {
"policy" : "regular",
"flavor_id" : "modelarts.p3.large.public.free",
"flavor_name" : "Computing GPU(V100) instance",
"node_count" : 1,
"flavor_detail" : {
"flavor_type" : "GPU",
"billing" : {
"code" : "modelarts.vm.gpu.free",
"unit_num" : 1
},
"flavor_info" : {
"cpu" : {
"arch" : "x86",
"core_num" : 8
},
"gpu" : {
"unit_num" : 1,
"product_name" : "NVIDIA-V100",
"memory" : "32GB"
},
"memory" : {
"size" : 64,
"unit" : "GB"
}
}
}
},
"log_export_path" : { }
}
} Status code: 400
Format of the body for a common error response. The following shows the returned information when an algorithm with ID 3f5d6706-7b67-408d-8ba0-ec08048c45ee is not found.
{
"error_msg" : "algorithm not found.",
"error_code" : "ModelArts.2755",
"error_solution" : "Check whether the training project information in the request is valid."
} Status Codes
| Status Code | Description |
|---|---|
| 201 | ok |
| 400 | Format of the body for a common error response. The following shows the returned information when an algorithm with ID 3f5d6706-7b67-408d-8ba0-ec08048c45ee is not found. |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.