ma-cli ma-job Commands for Training Jobs
Run the ma-cli ma-job command to submit training jobs, obtain training job logs, events, used AI engines, and resource specifications, and stop training jobs.
$ ma-cli ma-job -h Usage: ma-cli ma-job [OPTIONS] COMMAND [ARGS]... ModelArts job submission and query job details. Options: -h, -H, --help Show this message and exit. Commands: delete Delete training job by job id. get-engine Get job engines. get-event Get job running event. get-flavor Get job flavors. get-job Get job details. get-log Get job log details. get-pool Get job engines. stop Stop training job by job id. submit Submit training job.
Command |
Description |
---|---|
get-job |
Obtain ModelArts training jobs and their details. |
get-log |
Obtain runtime logs of a ModelArts training job. |
get-engine |
Obtain ModelArts AI engines for training. |
get-event |
Obtain ModelArts training job events. |
get-flavor |
Obtain ModelArts resource specifications for training. |
get-pool |
Obtain ModelArts resource pools dedicated for training. |
stop |
Stop a ModelArts training job. |
submit |
Submit a ModelArts training job. |
delete |
Delete a training job with a specified job ID. |
Using ma-cli ma-job get-job to Obtain a ModelArts Training Job
Run the ma-cli ma-job get-job command to obtain training jobs or details about a specific job.
$ ma-cli ma-job get-job -h Usage: ma-cli ma-job get-job [OPTIONS] Get job details. Example: # Get train job details by job name ma-cli ma-job get-job -n ${job_name} # Get train job details by job id ma-cli ma-job get-job -i ${job_id} # Get train job list ma-cli ma-job get-job --page-size 5 --page-num 1 Options: -i, --job-id TEXT Get training job details by job id. -n, --job-name TEXT Get training job details by job name. -pn, --page-num INTEGER Specify which page to query. [x>=1] -ps, --page-size INTEGER RANGE The maximum number of results for this query. [1<=x<=50] -v, --verbose Show detailed information about training job details. -C, --config-file TEXT Configure file path for authorization. -D, --debug Debug Mode. Shows full stack trace when error occurs. -P, --profile TEXT CLI connection profile to use. The default profile is "DEFAULT". -h, -H, --help Show this message and exit.
Parameter |
Data Type |
Mandatory |
Description |
---|---|---|---|
-i / --job-id |
String |
No |
ID of the job whose details are to be obtained. |
-n / --job-name |
String |
No |
Name of the job to be queried or name keyword used to filter training jobs. |
-pn / --page-num |
Int |
No |
Page number. The default value is 1. |
-ps / --page-size |
Int |
No |
Number of training jobs displayed on each page. The default value is 10. |
-v / --verbose |
Bool |
No |
Whether to display detailed information. It is disabled by default. |
- Example: Obtain a training job with a specified job ID.
ma-cli ma-job get-job -i b63e90xxx
- Example: Filter training jobs by job name keyword auto.
ma-cli ma-job get-job -n auto
Using ma-cli ma-job submit to Submit a ModelArts Training Job
Run the ma-cli ma-job submit command to submit a ModelArts training job.
When running this command, use the YAML_FILE parameter to specify the path to the configuration file of the target job. If this parameter is not specified, the configuration file is empty. The configuration file is in YAML format, and its parameters are values of OPTIONS in the command. If you specify both the YAML_FILE and the OPTIONS parameters, the OPTIONS value will overwrite the same items in the configuration file.
$ma-cli ma-job submit -h Usage: ma-cli ma-job submit [OPTIONS] [YAML_FILE]... Submit training job. Example: ma-cli ma-job submit --code-dir obs://your_bucket/code/ --boot-file main.py --framework-type PyTorch --working-dir /home/ma-user/modelarts/user-job-dir/code --framework-version pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 --data-url obs://your_bucket/dataset/ --log-url obs://your_bucket/logs/ --train-instance-type modelarts.vm.cpu.8u --train-instance-count 1 Options: --name TEXT Job name. --description TEXT Job description. --image-url TEXT Full swr custom image path. --uid TEXT Uid for custom image (default: 1000). --working-dir TEXT ModelArts training job working directory. --local-code-dir TEXT ModelArts training job local code directory. --user-command TEXT Execution command for custom image. --pool-id TEXT Dedicated pool id. --train-instance-type TEXT Train worker specification. --train-instance-count INTEGER Number of workers. --data-url TEXT OBS path for training data. --log-url TEXT OBS path for training log. --code-dir TEXT OBS path for source code. --output TEXT Training output parameter with OBS path. --input TEXT Training input parameter with OBS path. --env-variables TEXT Env variables for training job. --parameters TEXT Training job parameters (only keyword parameters are supported). --boot-file TEXT Training job boot file path behinds `code_dir`. --framework-type TEXT Training job framework type. --framework-version TEXT Training job framework version. --workspace-id TEXT The workspace where you submit training job(default "0") --policy [regular|economic|turbo|auto] Training job policy, default is regular. --volumes TEXT Information about the volumes attached to the training job. -q, --quiet Exit without waiting after submit successfully. -C, --config-file PATH Configure file path for authorization. -D, --debug Debug Mode. Shows full stack trace when error occurs. -P, --profile TEXT CLI connection profile to use. The default profile is "DEFAULT". -H, -h, --help Show this message and exit.
Parameter |
Data Type |
Mandatory |
Description |
---|---|---|---|
YAML_FILE |
String |
No |
Configuration file of a training job. If this parameter is not specified, the configuration file is empty. |
--code-dir |
String |
Yes |
OBS path to the training source code. |
--data-url |
String |
Yes |
OBS path to the training data. |
--log-url |
String |
Yes |
OBS path to training logs. |
--train-instance-count |
String |
Yes |
Number of compute nodes in a training job. The default value is 1, indicating a standalone node. |
--boot-file |
String |
No |
Boot file specified when you use a preset command to submit a training job. This parameter can be omitted when you use a custom image or a custom command to submit a training job. |
--name |
String |
No |
Name of a training job. |
--description |
String |
No |
Description of a training job. |
--image-url |
String |
No |
SWR URL of a custom image, which is in the format of "organization/image_name:tag". |
--uid |
String |
No |
UID of the custom image. The default value is 1000. |
--working-dir |
String |
No |
Work directory where an algorithm is executed. |
--local-code-dir |
String |
No |
Local directory of the training container to which the algorithm code directory is downloaded. |
--user-command |
String |
No |
Command for executing a custom image. The directory must be under /home. When code-dir is prefixed with file://, this parameter does not take effect. |
--pool-id |
String |
No |
Resource pool ID selected for a training job. You can log in to the ModelArts console, choose Dedicated Resource Pools in the navigation pane on the left, and view the resource pool ID in the dedicated resource pool list. |
--train-instance-type |
String |
No |
Resource flavor selected for a training job. |
--output |
String |
No |
Training output. After this parameter is specified, the training job will upload the output directory of the training container corresponding to the specified output parameter in the training script to a specified OBS path. To specify multiple parameters, use --output output1=obs://bucket/output1 --output output2=obs://bucket/output2. |
--input |
String |
No |
Training input. After this parameter is specified, the training job will download the data from OBS to the training container and transfer the data storage path to the training script through the specified parameter. To specify multiple parameters, use --input data_path1=obs://bucket/data1 --input data_path2=obs://bucket/data2. |
--env-variables |
String |
No |
Environment variables input during training. To specify multiple parameters, use --env-variables ENV1=env1 --env-variables ENV2=env2. |
--parameters |
String |
No |
Training input parameters. To specify multiple parameters, use --parameters "--epoch 0 --pretrained". |
--framework-type |
String |
No |
Framework type selected for a training job. |
--framework-version |
String |
No |
Framework version selected for a training job. |
-q / --quiet |
Bool |
No |
Whether to exit directly without printing the job status synchronously after a training job is submitted. |
--workspace-id |
String |
No |
Workspace where a training job is deployed. The default value is 0. |
--policy |
String |
No |
Training resource flavor mode. The options are regular, economic, turbo, and auto. |
--volumes |
String |
No |
EFS disks to be mounted. To specify multiple parameters, use --volumes. "local_path=/xx/yy/zz;read_only=false;nfs_server_path=xxx.xxx.xxx.xxx:/" -volumes "local_path=/xxx/yyy/zzz;read_only=false;nfs_server_path=xxx.xxx.xxx.xxx:/" |
Example: Submitting a Training Job Based on a Preset ModelArts Image
Submit a training job by specifying the OPTIONS parameter.
ma-cli ma-job submit --code-dir obs://your-bucket/mnist/code/ \ --boot-file main.py \ --framework-type PyTorch \ --working-dir /home/ma-user/modelarts/user-job-dir/code \ --framework-version pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 \ --data-url obs://your-bucket/mnist/dataset/MNIST/ \ --log-url obs://your-bucket/mnist/logs/ \ --train-instance-type modelarts.vm.cpu.8u \ --train-instance-count 1 \ -q
Example of train.yaml using a preset image:
# Example .ma/train.yaml (preset image) # pool_id: pool_xxxx train-instance-type: modelarts.vm.cpu.8u train-instance-count: 1 data-url: obs://your-bucket/mnist/dataset/MNIST/ code-dir: obs://your-bucket/mnist/code/ working-dir: /home/ma-user/modelarts/user-job-dir/code framework-type: PyTorch framework-version: pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 boot-file: main.py log-url: obs://your-bucket/mnist/logs/ ##[Optional] Uncomment to set uid when use custom image mode uid: 1000 ##[Optional] Uncomment to upload output file/dir to OBS from training platform output: - name: output_dir obs_path: obs://your-bucket/mnist/output1/ ##[Optional] Uncomment to download input file/dir from OBS to training platform input: - name: data_url obs_path: obs://your-bucket/mnist/dataset/MNIST/ ##[Optional] Uncomment pass hyperparameters parameters: - epoch: 10 - learning_rate: 0.01 - pretrained: ##[Optional] Uncomment to use dedicated pool pool_id: pool_xxxx ##[Optional] Uncomment to use volumes attached to the training job volumes: - efs: local_path: /xx/yy/zz read_only: false nfs_server_path: xxx.xxx.xxx.xxx:/
Example: Using a Custom Image to Create a Training Job
Submit a training job by specifying the OPTIONS parameter.
ma-cli ma-job submit --image-url atelier/pytorch_1_8:pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64-20220926104358-041ba2e \ --code-dir obs://your-bucket/mnist/code/ \ --user-command "export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH && cd /home/ma-user/modelarts/user-job-dir/code && /home/ma-user/anaconda3/envs/PyTorch-1.8/bin/python main.py" \ --data-url obs://your-bucket/mnist/dataset/MNIST/ \ --log-url obs://your-bucket/mnist/logs/ \ --train-instance-type modelarts.vm.cpu.8u \ --train-instance-count 1 \ -q
Example of train.yaml using a custom image:
# Example .ma/train.yaml (custom image) image-url: atelier/pytorch_1_8:pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64-20220926104358-041ba2e user-command: export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH && cd /home/ma-user/modelarts/user-job-dir/code && /home/ma-user/anaconda3/envs/PyTorch-1.8/bin/python main.py train-instance-type: modelarts.vm.cpu.8u train-instance-count: 1 data-url: obs://your-bucket/mnist/dataset/MNIST/ code-dir: obs://your-bucket/mnist/code/ log-url: obs://your-bucket/mnist/logs/ ##[Optional] Uncomment to set uid when use custom image mode uid: 1000 ##[Optional] Uncomment to upload output file/dir to OBS from training platform output: - name: output_dir obs_path: obs://your-bucket/mnist/output1/ ##[Optional] Uncomment to download input file/dir from OBS to training platform input: - name: data_url obs_path: obs://your-bucket/mnist/dataset/MNIST/ ##[Optional] Uncomment pass hyperparameters parameters: - epoch: 10 - learning_rate: 0.01 - pretrained: ##[Optional] Uncomment to use dedicated pool pool_id: pool_xxxx ##[Optional] Uncomment to use volumes attached to the training job volumes: - efs: local_path: /xx/yy/zz read_only: false nfs_server_path: xxx.xxx.xxx.xxx:/
Using ma-cli ma-job get-log to Obtain ModelArts Training Job Logs
Run the ma-cli ma-job get-log command to obtain ModelArts training job logs.
$ ma-cli ma-job get-log -h Usage: ma-cli ma-job get-log [OPTIONS] Get job log details. Example: # Get job log by job id ma-cli ma-job get-log --job-id ${job_id} Options: -i, --job-id TEXT Get training job details by job id. [required] -t, --task-id TEXT Get training job details by task id (default "worker-0"). -C, --config-file TEXT Configure file path for authorization. -D, --debug Debug Mode. Shows full stack trace when error occurs. -P, --profile TEXT CLI connection profile to use. The default profile is "DEFAULT". -h, -H, --help Show this message and exit.
Parameter |
Data Type |
Mandatory |
Description |
---|---|---|---|
-i / --job-id |
String |
Yes |
ID of the job whose logs are to be obtained. |
-t / --task-id |
String |
No |
ID of the task whose logs are to be obtained. The default value is work-0. |
Example: Obtain logs of a specified training job.
ma-cli ma-job get-log --job-id b63e90baxxx
Using ma-cli ma-job get-event to Obtain ModelArts Training Job Events
Run the ma-cli ma-job get-event command to obtain ModelArts training job events.
$ ma-cli ma-job get-event -h Usage: ma-cli ma-job get-event [OPTIONS] Get job running event. Example: # Get training job running event ma-cli ma-job get-event --job-id ${job_id} Options: -i, --job-id TEXT Get training job event by job id. [required] -C, --config-file TEXT Configure file path for authorization. -D, --debug Debug Mode. Shows full stack trace when error occurs. -P, --profile TEXT CLI connection profile to use. The default profile is "DEFAULT". -H, -h, --help Show this message and exit.
Parameter |
Data Type |
Mandatory |
Description |
---|---|---|---|
-i / --job-id |
String |
Yes |
ID of the job whose events are to be obtained. |
Example: Obtain events of a specified training job.
ma-cli ma-job get-event --job-id b63e90baxxx
Using ma-cli ma-job get-engine to Obtain the AI Engines Used by ModelArts Training Jobs
Run the ma-cli ma-job get-engine command to obtain the AI engines used by ModelArts training jobs.
$ ma-cli ma-job get-engine -h Usage: ma-cli ma-job get-engine [OPTIONS] Get job engine info. Example: # Get training job engines ma-cli ma-job get-engine Options: -v, --verbose Show detailed information about training engines. -C, --config-file TEXT Configure file path for authorization. -D, --debug Debug Mode. Shows full stack trace when error occurs. -P, --profile TEXT CLI connection profile to use. The default profile is "DEFAULT". -H, -h, --help Show this message and exit.
Parameter |
Data Type |
Mandatory |
Description |
---|---|---|---|
-v / --verbose |
Bool |
No |
Whether to display detailed information. It is disabled by default. |
Example: Obtain the AI engines used by training jobs.
ma-cli ma-job get-engine
Using ma-cli ma-job get-flavor to Obtain the Resource Flavors Used by ModelArts Training Jobs
Run the ma-cli ma-job get-flavor command to obtain the resource flavors used by ModelArts training jobs.
$ ma-cli ma-job get-flavor -h Usage: ma-cli ma-job get-flavor [OPTIONS] Get job flavor info. Example: # Get training job flavors ma-cli ma-job get-flavor Options: -t, --flavor-type [CPU|GPU|Ascend] Type of training job flavor. -v, --verbose Show detailed information about training flavors. -C, --config-file TEXT Configure file path for authorization. -D, --debug Debug Mode. Shows full stack trace when error occurs. -P, --profile TEXT CLI connection profile to use. The default profile is "DEFAULT". -H, -h, --help Show this message and exit.
Parameter |
Data Type |
Mandatory |
Description |
---|---|---|---|
-t / --flavor-type |
String |
No |
Resource flavor type. If this parameter is not specified, all resource flavors are returned by default. |
-v / --verbose |
Bool |
No |
Whether to display detailed information. It is disabled by default. |
Example: Obtain the resource flavors and types of training jobs.
ma-cli ma-job get-flavor
Using ma-cli ma-job stop to Stop a ModelArts Training Job
Run the ma-cli ma-job stop command to stop a training job with a specified job ID.
$ ma-cli ma-job stop -h Usage: ma-cli ma-job stop [OPTIONS] Stop training job by job id. Example: Stop training job by job id ma-cli ma-job stop --job-id ${job_id} Options: -i, --job-id TEXT Get training job event by job id. [required] -y, --yes Confirm stop operation. -C, --config-file TEXT Configure file path for authorization. -D, --debug Debug Mode. Shows full stack trace when error occurs. -P, --profile TEXT CLI connection profile to use. The default profile is "DEFAULT". -H, -h, --help Show this message and exit.
Parameter |
Data Type |
Mandatory |
Description |
---|---|---|---|
-i / --job-id |
String |
Yes |
ID of a ModelArts training job |
-y / --yes |
Bool |
No |
Whether to forcibly stop a training job |
Example: Stop a running training job.
ma-cli ma-job stop --job-id efd3e2f8xxx
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot