Creating a Training Job
Function
This API is used to create a training job.
Calling this API is an asynchronous operation. The job status can be obtained by calling the APIs described in Querying a Training Job List and Querying the Details About a Training Job Version.
URI
POST /v1/{project_id}/training-jobs
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
Request Body
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
job_name |
Yes |
String |
Training job name. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). |
job_desc |
No |
String |
Description of a training job. The value must contain 0 to 256 characters. By default, this parameter is left blank. |
config |
Yes |
Object |
Parameters for creating a training job For details, see Table 3. |
workspace_id |
No |
String |
Workspace where a job resides. Default value: 0 |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
worker_server_num |
Yes |
Integer |
Number of workers in a training job. Obtain the maximum value from the max_num value returned by the API in Querying Job Resource Specifications. |
app_url |
Yes |
String |
Code directory of a training job, for example, /usr/app/. This parameter must be used together with boot_file_url. After setting model_id, you do not need to set app_url or boot_file_url, and engine_id. |
boot_file_url |
Yes |
String |
Boot file of a training job, which needs to be stored in the code directory. Example value: /usr/app/boot.py This parameter must be used together with app_url. After setting model_id, you do not need to set app_url or boot_file_url, and engine_id. |
parameter |
No |
Array<Object> |
Running parameters of a training job. It is a collection of label-value pairs. Values can be customized. label is a parameter name and value is the parameter value. For details, see the sample request. This parameter is a container environment variable when a training job uses a custom image. For details, see Table 8. |
data_url |
No |
String |
OBS URL of the dataset required by a training job. By default, this parameter is left blank. For example, /usr/data/. This parameter cannot be used together with data_source or dataset_id and dataset_version_id. However, one of the parameters must exist. |
dataset_id |
No |
String |
Dataset ID of a training job. This parameter must be used together with dataset_version_id, but cannot be used together with data_url or data_source. |
dataset_version_id |
No |
String |
Dataset version ID of a training job. This parameter must be used together with dataset_id, but cannot be used together with data_url or data_source. |
data_source |
No |
Array<Object> |
Dataset of a training job. This parameter cannot be used together with data_url or dataset_id and dataset_version_id. For details, see Table 4. |
spec_id |
Yes |
Long |
ID of the resource specifications selected for a training job. Obtain the ID by calling the API described in Querying Job Resource Specifications. When creating a public pool job, ensure that spec_id is mandatory and cannot be used with pool_id. |
pool_id |
Yes |
String |
ID of a dedicated resource pool. To obtain the ID, do as follows: Log in to the ModelArts management console, choose Dedicated Resource Pools in the navigation pane on the left, and view the resource pool ID in the dedicated resource pool list. When creating a dedicated pool job, ensure that pool_id is mandatory and cannot be used with spec_id. |
engine_id |
Yes |
Long |
ID of the engine selected for a training job. The default value is 1. After setting model_id, you do not need to set app_url or boot_file_url, and engine_id. Obtain the ID by calling the API described in Querying Job Engine Specifications. |
model_id |
Yes |
Long |
ID of the built-in model of a training job. Obtain model_id by calling the API described in Querying a Built-in Algorithm. After setting model_id, you do not need to set app_url or boot_file_url, and engine_id. |
train_url |
No |
String |
OBS URL of the output file of a training job. By default, this parameter is left blank. Example value: /usr/train/ |
log_url |
No |
String |
OBS URL of the logs of a training job. By default, this parameter is left blank. Example value: /usr/log/ |
user_image_url |
No |
String |
SWR URL of a custom image used by a training job. Example value: 100.125.5.235:20202/jobmng/custom-cpu-base:1.0 |
user_command |
No |
String |
Boot command used to start the container of a custom image of a training job. The format is bash /home/work/run_train.sh python /home/work/user-job-dir/app/train.py {python_file_parameter}. |
create_version |
No |
Boolean |
Whether a version is created when a training job is created
|
volumes |
No |
JSON Array |
Storage volume that can be used by a training job. For details, see Table 5. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
dataset_id |
No |
String |
Dataset ID of a training job. This parameter must be used together with dataset_version_id, but cannot be used together with data_url. |
dataset_version |
No |
String |
Dataset version ID of a training job. This parameter must be used together with dataset_id, but cannot be used together with data_url. |
type |
No |
String |
Dataset type. The value can be obs or dataset. obs and dataset cannot be used at the same time. |
data_url |
No |
String |
OBS bucket path. This parameter cannot be used together with dataset_id or dataset_version. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
nfs |
No |
Object |
Storage volume of the shared file system type. Only the training jobs running in a resource pool with the shared file system network connected support such storage volumes. For details, see Table 6. |
host_path |
No |
Object |
Storage volume of the host file system type. Only training jobs running in a dedicated resource pool support such storage volumes. For details, see Table 7. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
Yes |
String |
ID of an SFS Turbo file system |
src_path |
Yes |
String |
Address of an SFS Turbo file system |
dest_path |
Yes |
String |
Local path to a training job |
read_only |
No |
Boolean |
Whether dest_path is read-only. The default value is false.
|
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
src_path |
Yes |
String |
Local path to a host |
dest_path |
Yes |
String |
Local path to a training job |
read_only |
No |
Boolean |
Whether dest_path is read-only. The default value is false.
|
Response Body
Table 9 describes the response parameters.
Parameter |
Type |
Description |
---|---|---|
is_success |
Boolean |
Whether the request is successful |
error_message |
String |
Error message of a failed API call. This parameter is not included when the API call succeeds. |
error_code |
String |
Error code of a failed API call. For details, see Error Codes. This parameter is not included when the API call succeeds. |
job_id |
Long |
ID of a training job |
job_name |
String |
Name of a training job |
status |
Int |
Status of a training job. For details about the job statuses, see Job Statuses. |
create_time |
Long |
Timestamp when a training job is created |
version_id |
Long |
Version ID of a training job |
resource_id |
String |
Charged resource ID of a training job |
version_name |
String |
Version name of a training job |
Sample Request
- The following shows how to create training job TestModelArtsJob with This is a ModelArts job as its description.
POST https://endpoint/v1/{project_id}/training-jobs { "job_name": "TestModelArtsJob", "job_desc": "This is a ModelArts job", "workspace_id": "af261af2218841ec960b01ab3cf1a5fa", "config": { "worker_server_num": 1, "app_url": "/usr/app/", "boot_file_url": "/usr/app/boot.py", "parameter": [ { "label": "learning_rate", "value": "0.01" }, { "label": "batch_size", "value": "32" } ], "dataset_id": "38277e62-9e59-48f4-8d89-c8cf41622c24", "dataset_version_id": "2ff0d6ba-c480-45ae-be41-09a8369bfc90", "spec_id": 1, "engine_id": 1, "train_url": "/usr/train/", "log_url": "/usr/log/", "model_id": 1, "pool_id": "testpool" } }
- The following shows how to create training job TestModelArtsJob2 using a custom image.
POST https://endpoint/v1/{project_id}/training-jobs { "job_name": "TestModelArtsJob2", "job_desc": "This is a ModelArts job", "workspace_id": "af261af2218841ec960b01ab3cf1a5fa", "config": { "worker_server_num": 1, "data_url": "/usr/data/", "app_url": "/usr/app/", "boot_file_url": "/usr/app/boot.py", "parameter": [ { "label": "CUSTOM_PARAM1", "value": "1" } ], "spec_id": 1, "user_command": "bash -x /home/work/run_train.sh python /home/work/user-job-dir/app/mnist/mnist_softmax.py --data_url /home/work/user-job-dir/app/mnist_data", "user_image_url": "100.125.5.235:20202/jobmng/custom-cpu-base:1.0", "train_url": "/usr/train/", "log_url": "/usr/log/", "model_id": 1, "pool_id": "testpool", "engine_id": 1 } }
- The following shows how to create training job TestModelArtsJob3 using disk storage.
POST https://endpoint/v1/{project_id}/training-jobs { "job_name": "TestModelArtsJob3", "job_desc": "This is a ModelArts job", "workspace_id": "af261af2218841ec960b01ab3cf1a5fa", "config": { "worker_server_num": 1, "app_url": "/usr/app/", "boot_file_url": "/usr/app/boot.py", "parameter": [ { "label": "learning_rate", "value": "0.01" }, { "label": "batch_size", "value": "32" } ], "dataset_id": "38277e62-9e59-48f4-8d89-c8cf41622c24", "dataset_version_id": "2ff0d6ba-c480-45ae-be41-09a8369bfc90", "spec_id": 1, "engine_id": 1, "train_url": "/usr/train/", "log_url": "/usr/log/", "model_id": 1, "pool_id": "testpool", "volumes": [ { "nfs": { "id": "43b37236-9afa-4855-8174-32254b9562e7", "src_path": "192.168.8.150:/", "dest_path": "/home/work/nas", "read_only": false } }, { "host_path": { "src_path": "/root/work", "dest_path": "/home/mind", "read_only": false } } ] } }
Sample Response
- Successful response
{ "is_success": true, "job_id": "10", "job_name": "TestModelArtsJob", "status": "1", "create_time": "1524189990635", "version_id": "10", "version_name": "V0001", "resource_id": "jobafd08896" }
- Failed response
{ "is_success": false, "error_message": "Job name:TestModelArtsJob is existed", "error_code": "ModelArts.0103" }
Status Code
For details about the status code, see Status Code.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot