Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

ma-cli ma-job Commands for Training Jobs

Updated on 2024-10-29 GMT+08:00

Run the ma-cli ma-job command to submit training jobs, obtain training job logs, events, used AI engines, and resource specifications, and stop training jobs.

$ ma-cli ma-job -h
Usage: ma-cli ma-job [OPTIONS] COMMAND [ARGS]...

  ModelArts job submission and query job details.

Options:
  -h, -H, --help  Show this message and exit.

Commands:
  delete      Delete training job by job id.
  get-engine  Get job engines.
  get-event   Get job running event.
  get-flavor  Get job flavors.
  get-job     Get job details.
  get-log     Get job log details.
  get-pool    Get job engines.
  stop        Stop training job by job id.
  submit      Submit training job.
Table 1 Commands supported by training jobs

Command

Description

get-job

Obtain ModelArts training jobs and their details.

get-log

Obtain runtime logs of a ModelArts training job.

get-engine

Obtain ModelArts AI engines for training.

get-event

Obtain ModelArts training job events.

get-flavor

Obtain ModelArts resource specifications for training.

get-pool

Obtain ModelArts resource pools dedicated for training.

stop

Stop a ModelArts training job.

submit

Submit a ModelArts training job.

delete

Delete a training job with a specified job ID.

Using ma-cli ma-job get-job to Obtain a ModelArts Training Job

Run the ma-cli ma-job get-job command to obtain training jobs or details about a specific job.

$ ma-cli ma-job get-job -h
Usage: ma-cli ma-job get-job [OPTIONS]

  Get job details.

  Example:

  # Get train job details by job name
  ma-cli ma-job get-job -n ${job_name}

  # Get train job details by job id
  ma-cli ma-job get-job -i ${job_id}

  # Get train job list
  ma-cli ma-job get-job --page-size 5 --page-num 1

Options:
  
  -i, --job-id TEXT               Get training job details by job id.
  -n, --job-name TEXT             Get training job details by job name.
  -pn, --page-num INTEGER         Specify which page to query.  [x>=1]
  -ps, --page-size INTEGER RANGE  The maximum number of results for this query.  [1<=x<=50]
  -v, --verbose                   Show detailed information about training job details.
  -C, --config-file TEXT          Configure file path for authorization.
  -D, --debug                     Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT              CLI connection profile to use. The default profile is "DEFAULT".
  -h, -H, --help                  Show this message and exit.
Table 2 Parameters

Parameter

Data Type

Mandatory

Description

-i / --job-id

String

No

ID of the job whose details are to be obtained.

-n / --job-name

String

No

Name of the job to be queried or name keyword used to filter training jobs.

-pn / --page-num

Int

No

Page number. The default value is 1.

-ps / --page-size

Int

No

Number of training jobs displayed on each page. The default value is 10.

-v / --verbose

Bool

No

Whether to display detailed information. It is disabled by default.

  • Example: Obtain a training job with a specified job ID.
    ma-cli ma-job get-job -i b63e90xxx

  • Example: Filter training jobs by job name keyword auto.
    ma-cli ma-job get-job -n auto

Using ma-cli ma-job submit to Submit a ModelArts Training Job

Run the ma-cli ma-job submit command to submit a ModelArts training job.

When running this command, use the YAML_FILE parameter to specify the path to the configuration file of the target job. If this parameter is not specified, the configuration file is empty. The configuration file is in YAML format, and its parameters are values of OPTIONS in the command. If you specify both the YAML_FILE and the OPTIONS parameters, the OPTIONS value will overwrite the same items in the configuration file.

$ma-cli ma-job submit -h
Usage: ma-cli ma-job submit [OPTIONS] [YAML_FILE]...

  Submit training job.

  Example:

  ma-cli ma-job submit --code-dir obs://your_bucket/code/
                       --boot-file main.py
                       --framework-type PyTorch
                       --working-dir /home/ma-user/modelarts/user-job-dir/code
                       --framework-version pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64
                       --data-url obs://your_bucket/dataset/
                       --log-url obs://your_bucket/logs/
                       --train-instance-type modelarts.vm.cpu.8u
                       --train-instance-count 1

Options:
  --name TEXT                     Job name.
  --description TEXT              Job description.
  --image-url TEXT                Full swr custom image path.
  --uid TEXT                      Uid for custom image (default: 1000).
  --working-dir TEXT              ModelArts training job working directory.
  --local-code-dir TEXT           ModelArts training job local code directory.
  --user-command TEXT             Execution command for custom image.
  --pool-id TEXT                  Dedicated pool id.
  --train-instance-type TEXT      Train worker specification.
  --train-instance-count INTEGER  Number of workers.
  --data-url TEXT                 OBS path for training data.
  --log-url TEXT                  OBS path for training log.
  --code-dir TEXT                 OBS path for source code.
  --output TEXT                   Training output parameter with OBS path.
  --input TEXT                    Training input parameter with OBS path.
  --env-variables TEXT            Env variables for training job.
  --parameters TEXT               Training job parameters (only keyword parameters are supported).
  --boot-file TEXT                Training job boot file path behinds `code_dir`.
  --framework-type TEXT           Training job framework type.
  --framework-version TEXT        Training job framework version.
  --workspace-id TEXT             The workspace where you submit training job(default "0")
  --policy [regular|economic|turbo|auto]
                                  Training job policy, default is regular.
  --volumes TEXT                  Information about the volumes attached to the training job.
  -q, --quiet                     Exit without waiting after submit successfully.
  -C, --config-file PATH          Configure file path for authorization.
  -D, --debug                     Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT              CLI connection profile to use. The default profile is "DEFAULT".
  -H, -h, --help                  Show this message and exit.
Table 3 Parameters

Parameter

Data Type

Mandatory

Description

YAML_FILE

String

No

Configuration file of a training job. If this parameter is not specified, the configuration file is empty.

--code-dir

String

Yes

OBS path to the training source code.

--data-url

String

Yes

OBS path to the training data.

--log-url

String

Yes

OBS path to training logs.

--train-instance-count

String

Yes

Number of compute nodes in a training job. The default value is 1, indicating a standalone node.

--boot-file

String

No

Boot file specified when you use a preset command to submit a training job. This parameter can be omitted when you use a custom image or a custom command to submit a training job.

--name

String

No

Name of a training job.

--description

String

No

Description of a training job.

--image-url

String

No

SWR URL of a custom image, which is in the format of "organization/image_name:tag".

--uid

String

No

UID of the custom image. The default value is 1000.

--working-dir

String

No

Work directory where an algorithm is executed.

--local-code-dir

String

No

Local directory of the training container to which the algorithm code directory is downloaded.

--user-command

String

No

Command for executing a custom image. The directory must be under /home. When code-dir is prefixed with file://, this parameter does not take effect.

--pool-id

String

No

Resource pool ID selected for a training job. You can log in to the ModelArts console, choose Dedicated Resource Pools in the navigation pane on the left, and view the resource pool ID in the dedicated resource pool list.

--train-instance-type

String

No

Resource flavor selected for a training job.

--output

String

No

Training output. After this parameter is specified, the training job will upload the output directory of the training container corresponding to the specified output parameter in the training script to a specified OBS path. To specify multiple parameters, use --output output1=obs://bucket/output1 --output output2=obs://bucket/output2.

--input

String

No

Training input. After this parameter is specified, the training job will download the data from OBS to the training container and transfer the data storage path to the training script through the specified parameter. To specify multiple parameters, use --input data_path1=obs://bucket/data1 --input data_path2=obs://bucket/data2.

--env-variables

String

No

Environment variables input during training. To specify multiple parameters, use --env-variables ENV1=env1 --env-variables ENV2=env2.

--parameters

String

No

Training input parameters. To specify multiple parameters, use --parameters "--epoch 0 --pretrained".

--framework-type

String

No

Framework type selected for a training job.

--framework-version

String

No

Framework version selected for a training job.

-q / --quiet

Bool

No

Whether to exit directly without printing the job status synchronously after a training job is submitted.

--workspace-id

String

No

Workspace where a training job is deployed. The default value is 0.

--policy

String

No

Training resource flavor mode. The options are regular, economic, turbo, and auto.

--volumes

String

No

EFS disks to be mounted. To specify multiple parameters, use --volumes.

"local_path=/xx/yy/zz;read_only=false;nfs_server_path=xxx.xxx.xxx.xxx:/" -volumes "local_path=/xxx/yyy/zzz;read_only=false;nfs_server_path=xxx.xxx.xxx.xxx:/"

Example: Submitting a Training Job Based on a Preset ModelArts Image

Submit a training job by specifying the OPTIONS parameter.

ma-cli ma-job submit --code-dir obs://your-bucket/mnist/code/ \
                  --boot-file main.py \
                  --framework-type PyTorch \
                  --working-dir /home/ma-user/modelarts/user-job-dir/code \
                  --framework-version pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 \
                  --data-url obs://your-bucket/mnist/dataset/MNIST/ \
                  --log-url obs://your-bucket/mnist/logs/ \
                  --train-instance-type modelarts.vm.cpu.8u \
                  --train-instance-count 1  \
                  -q

Example of train.yaml using a preset image:

# Example .ma/train.yaml (preset image)
# pool_id: pool_xxxx
train-instance-type: modelarts.vm.cpu.8u
train-instance-count: 1
data-url: obs://your-bucket/mnist/dataset/MNIST/
code-dir: obs://your-bucket/mnist/code/
working-dir: /home/ma-user/modelarts/user-job-dir/code
framework-type: PyTorch
framework-version: pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64
boot-file: main.py
log-url: obs://your-bucket/mnist/logs/

##[Optional] Uncomment to set uid when use custom image mode
uid: 1000

##[Optional] Uncomment to upload output file/dir to OBS from training platform
output:
    - name: output_dir
      obs_path: obs://your-bucket/mnist/output1/

##[Optional] Uncomment to download input file/dir from OBS to training platform
input:
    - name: data_url
      obs_path: obs://your-bucket/mnist/dataset/MNIST/

##[Optional] Uncomment pass hyperparameters
parameters:
    - epoch: 10
    - learning_rate: 0.01
    - pretrained:

##[Optional] Uncomment to use dedicated pool
pool_id: pool_xxxx

##[Optional] Uncomment to use volumes attached to the training job
volumes:
  - efs:
      local_path: /xx/yy/zz
      read_only: false
      nfs_server_path: xxx.xxx.xxx.xxx:/

Example: Using a Custom Image to Create a Training Job

Submit a training job by specifying the OPTIONS parameter.

ma-cli ma-job submit --image-url atelier/pytorch_1_8:pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64-20220926104358-041ba2e \
                  --code-dir obs://your-bucket/mnist/code/ \
                  --user-command "export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH && cd /home/ma-user/modelarts/user-job-dir/code && /home/ma-user/anaconda3/envs/PyTorch-1.8/bin/python main.py" \
                  --data-url obs://your-bucket/mnist/dataset/MNIST/ \
                  --log-url obs://your-bucket/mnist/logs/ \
                  --train-instance-type modelarts.vm.cpu.8u \
                  --train-instance-count 1  \
                  -q

Example of train.yaml using a custom image:

# Example .ma/train.yaml (custom image)
image-url: atelier/pytorch_1_8:pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64-20220926104358-041ba2e
user-command: export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH && cd /home/ma-user/modelarts/user-job-dir/code && /home/ma-user/anaconda3/envs/PyTorch-1.8/bin/python main.py
train-instance-type: modelarts.vm.cpu.8u
train-instance-count: 1
data-url: obs://your-bucket/mnist/dataset/MNIST/
code-dir: obs://your-bucket/mnist/code/
log-url: obs://your-bucket/mnist/logs/

##[Optional] Uncomment to set uid when use custom image mode
uid: 1000

##[Optional] Uncomment to upload output file/dir to OBS from training platform
output:
    - name: output_dir
      obs_path: obs://your-bucket/mnist/output1/

##[Optional] Uncomment to download input file/dir from OBS to training platform
input:
    - name: data_url
      obs_path: obs://your-bucket/mnist/dataset/MNIST/

##[Optional] Uncomment pass hyperparameters
parameters:
    - epoch: 10
    - learning_rate: 0.01
    - pretrained:

##[Optional] Uncomment to use dedicated pool
pool_id: pool_xxxx

##[Optional] Uncomment to use volumes attached to the training job
volumes:
  - efs:
      local_path: /xx/yy/zz
      read_only: false
      nfs_server_path: xxx.xxx.xxx.xxx:/

Using ma-cli ma-job get-log to Obtain ModelArts Training Job Logs

Run the ma-cli ma-job get-log command to obtain ModelArts training job logs.

$ ma-cli ma-job get-log -h
Usage: ma-cli ma-job get-log [OPTIONS]

  Get job log details.

  Example:

  # Get job log by job id
  ma-cli ma-job get-log --job-id ${job_id}

Options:
  -i, --job-id TEXT       Get training job details by job id.  [required]
  -t, --task-id TEXT      Get training job details by task id (default "worker-0").
  -C, --config-file TEXT  Configure file path for authorization.
  -D, --debug             Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT      CLI connection profile to use. The default profile is "DEFAULT".
  -h, -H, --help          Show this message and exit.

Parameter

Data Type

Mandatory

Description

-i / --job-id

String

Yes

ID of the job whose logs are to be obtained.

-t / --task-id

String

No

ID of the task whose logs are to be obtained. The default value is work-0.

Example: Obtain logs of a specified training job.

ma-cli ma-job get-log --job-id b63e90baxxx

Using ma-cli ma-job get-event to Obtain ModelArts Training Job Events

Run the ma-cli ma-job get-event command to obtain ModelArts training job events.

$ ma-cli ma-job get-event -h
Usage: ma-cli ma-job get-event [OPTIONS]

  Get job running event.

  Example:

  # Get training job running event
  ma-cli ma-job get-event --job-id ${job_id}

Options:
  -i, --job-id TEXT       Get training job event by job id.  [required]
  -C, --config-file TEXT  Configure file path for authorization.
  -D, --debug             Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT      CLI connection profile to use. The default profile is "DEFAULT".
  -H, -h, --help          Show this message and exit.

Parameter

Data Type

Mandatory

Description

-i / --job-id

String

Yes

ID of the job whose events are to be obtained.

Example: Obtain events of a specified training job.

ma-cli ma-job get-event --job-id b63e90baxxx

Using ma-cli ma-job get-engine to Obtain the AI Engines Used by ModelArts Training Jobs

Run the ma-cli ma-job get-engine command to obtain the AI engines used by ModelArts training jobs.

$ ma-cli ma-job get-engine -h
Usage: ma-cli ma-job get-engine [OPTIONS]

  Get job engine info.

  Example:

  # Get training job engines
  ma-cli ma-job get-engine

Options:
  -v, --verbose           Show detailed information about training engines.
  -C, --config-file TEXT  Configure file path for authorization.
  -D, --debug             Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT      CLI connection profile to use. The default profile is "DEFAULT".
  -H, -h, --help          Show this message and exit.
Table 4 Parameters

Parameter

Data Type

Mandatory

Description

-v / --verbose

Bool

No

Whether to display detailed information. It is disabled by default.

Example: Obtain the AI engines used by training jobs.

ma-cli ma-job get-engine

Using ma-cli ma-job get-flavor to Obtain the Resource Flavors Used by ModelArts Training Jobs

Run the ma-cli ma-job get-flavor command to obtain the resource flavors used by ModelArts training jobs.

$ ma-cli ma-job get-flavor -h
Usage: ma-cli ma-job get-flavor [OPTIONS]

  Get job flavor info.

  Example:

  # Get training job flavors
  ma-cli ma-job get-flavor

Options:
  -t, --flavor-type [CPU|GPU|Ascend]
                                  Type of training job flavor.
  -v, --verbose                   Show detailed information about training flavors.
  -C, --config-file TEXT          Configure file path for authorization.
  -D, --debug                     Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT              CLI connection profile to use. The default profile is "DEFAULT".
  -H, -h, --help                  Show this message and exit.
Table 5 Parameters

Parameter

Data Type

Mandatory

Description

-t / --flavor-type

String

No

Resource flavor type. If this parameter is not specified, all resource flavors are returned by default.

-v / --verbose

Bool

No

Whether to display detailed information. It is disabled by default.

Example: Obtain the resource flavors and types of training jobs.

ma-cli ma-job get-flavor

Using ma-cli ma-job stop to Stop a ModelArts Training Job

Run the ma-cli ma-job stop command to stop a training job with a specified job ID.

$ ma-cli ma-job stop -h
Usage: ma-cli ma-job stop [OPTIONS]

  Stop training job by job id.

  Example:

  Stop training job by job id
  ma-cli ma-job stop --job-id ${job_id}

Options:
  -i, --job-id TEXT       Get training job event by job id.  [required]
  -y, --yes               Confirm stop operation.
  -C, --config-file TEXT  Configure file path for authorization.
  -D, --debug             Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT      CLI connection profile to use. The default profile is "DEFAULT".
  -H, -h, --help          Show this message and exit.
Table 6 Parameters

Parameter

Data Type

Mandatory

Description

-i / --job-id

String

Yes

ID of a ModelArts training job

-y / --yes

Bool

No

Whether to forcibly stop a training job

Example: Stop a running training job.

ma-cli ma-job stop --job-id efd3e2f8xxx

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback