Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
Huawei Cloud Astro Canvas
Huawei Cloud Astro Zero
CodeArts Governance
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance (CCI)
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Cloud Transformation
Well-Architected Framework
Cloud Adoption Framework
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Blockchain
Blockchain Service
Web3 Node Engine Service
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ ModelArts/ API Reference/ Training Management/ Terminating a Training Job

Terminating a Training Job

Updated on 2025-03-14 GMT+08:00

Function

This API is used to terminate a training job. Only jobs in the creating, awaiting, or running state can be terminated.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/training-jobs/{training_job_id}/actions

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details, see Obtaining a Project ID and Name.

training_job_id

Yes

String

Training job ID For details about how to obtain the value, see Querying the Training Job List.

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

action_type

Yes

String

Operation request for a training job. If this parameter is set to terminate, the training job is terminated.

Response Parameters

Status code: 202

Table 3 Response body parameters

Parameter

Type

Description

kind

String

Training job type, which is job by default. Options:

  • job: training job

metadata

JobMetadata object

Metadata of a training job.

status

Status object

Status of a training job. You do not need to set this parameter when creating a job.

algorithm

JobAlgorithmResponse object

Algorithm used by a training job. The options are as follows:

  • id: Only the algorithm ID is used.

  • subscription_id+item_version_id: The subscription ID and version ID of the algorithm are used.

  • code_dir+boot_file: The code directory and boot file of the training job are used.

tasks

Array of TaskResponse objects

List of tasks in heterogeneous training jobs.

spec

SpecResponce object

Specifications of a training job.

endpoints

JobEndpointsResp object

This section describes the configurations required for remotely accessing a training job.

Table 4 JobMetadata

Parameter

Type

Description

id

String

Training job ID, which is generated and returned by ModelArts after the training job is created.

name

String

Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-).

workspace_id

String

Workspace where a job is located. The default value is 0.

description

String

Training job description. The value must contain 0 to 256 characters. The default value is NULL.

create_time

Long

Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created.

user_name

String

Username for creating a training job. The username is generated and returned by ModelArts after a training job is created.

annotations

Map<String,String>

Advanced configurations of a training job. The options are as follows:

  • job_template: Template RL (heterogeneous job)

  • fault-tolerance/job-retry-num: 3 (number of retries upon a fault)

  • fault-tolerance/job-unconditional-retry: true (unconditional restart)

  • fault-tolerance/hang-retry: true (restart upon a suspension)

  • jupyter-lab/enable: true (JupyterLab training application)

  • tensorboard/enable: true (TensorBoard training application)

  • mindstudio-insight/enable: true (MindStudio Insight training application)

Table 5 Status

Parameter

Type

Description

phase

String

Level-1 status of a training job. The options are:

  • Creating: The gateway is being created.

  • Pending: waiting

  • Running

  • Failed: The task fails to be executed.

  • Completed: completed

  • Terminating: The task is being stopped.

  • Terminated: stopped

  • Abnormal: abnormal

secondary_phase

String

The level-2 status of a training job is an internal detailed status, which may be added, modified, or deleted. Dependency is not recommended. The options are:

  • Creating: The gateway is being created.

  • Queuing: queuing

  • Running

  • Failed: The task fails to be executed.

  • Completed: completed

  • Terminating: The task is being stopped.

  • Terminated: stopped

  • CreateFailed: The creation fails.

  • TerminatedFailed: The service fails to be stopped.

  • Unknown: unknown status

  • Lost: abnormal

duration

Long

Running duration of a training job, in milliseconds

node_count_metrics

Array<Array<Integer>>

Node count changes during the training job running period.

tasks

Array of strings

Tasks of a training job.

start_time

Long

Start time of a training job. The value is in timestamp format.

task_statuses

Array of TaskStatuses objects

Status of a training job task.

running_records

Array of RunningRecord objects

Running and fault recovery records of a training job

Table 6 TaskStatuses

Parameter

Type

Description

task

String

Task of a training job.

exit_code

Integer

Exit code of a training job task.

message

String

Error message of a training job task.

Table 7 RunningRecord

Parameter

Type

Description

start_at

Integer

Unix timestamp of the start time in the current running record, in seconds.

end_at

Integer

Unix timestamp of the end time in the current running record, in seconds.

start_type

String

Startup mode of the current running record.

  • init_or_rescheduled: This startup is the first running after scheduling, including the first startup and the running after scheduling recovery.

  • restarted: This startup is not the first running after scheduling but the running after a process restart.

end_reason

String

Reason why the current running record ends.

end_related_task

String

ID of the task worker that causes the end of the current running record, for example, worker-0.

end_recover

String

Fault tolerance policy used after the current running record ends. The enums are as follows:

  • npu_proc_restart: NPU in-place hot recovery

  • gpu_proc_restart: GPU in-place hot recovery

  • proc_restart: Process in-place recovery

  • pod_reschedule: Pod-level rescheduling

  • job_reschedule: Job-level rescheduling

  • job_reschedule_with_taint: Isolated job-level rescheduling

end_recover_before_downgrade

String

Tolerance policy used after the current running record ends and before the fault tolerance policy is degraded. The options are the same as those of end_recover.

Table 8 JobAlgorithmResponse

Parameter

Type

Description

id

String

Algorithm used by a training job. The options are as follows:

  • id: Only the algorithm ID is used.

  • subscription_id+item_version_id: The subscription ID and version ID of the algorithm are used.

  • code_dir+boot_file: The code directory and boot file of the training job are used.

name

String

Algorithm name.

subscription_id

String

Subscription ID of a subscribed algorithm, which must be used with item_version_id

item_version_id

String

Version ID of the subscribed algorithm, which must be used with subscription_id

code_dir

String

Code directory of a training job, for example, /usr/app/. This parameter must be set together with boot_file. If id or subscription_id+item_version_id has been set for boot_file, you do not need to set this parameter.

boot_file

String

Boot file of a training job, which needs to be stored in the code directory. for example, /usr/app/boot.py. This parameter must be used together with code_dir. If id or subscription_id+item_version_id has been set for code_dir, you do not need to set this parameter.

autosearch_config_path

String

YAML configuration path of an auto search job. An OBS URL is required. For example, obs://bucket/file.yaml.

autosearch_framework_path

String

Framework code directory of auto search jobs. An OBS URL is required. For example, obs://bucket/files/.

command

String

Boot command for starting the container of a custom image for a training job. For example, python train.py.

parameters

Array of Parameter objects

Running parameters of a training job.

policies

policies object

Policies supported by jobs.

inputs

Array of Input objects

Input of a training job.

outputs

Array of Output objects

Output of a training job.

engine

JobEngine object

Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm.

local_code_dir

String

Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows:

  • The directory must be under /home.

  • In v1 compatibility mode, the current field does not take effect.

  • When code_dir is prefixed with file://, the current field does not take effect.

working_dir

String

Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode.

environments

Array of Map<String,String> objects

Environment variables of a training job. The format is key:value. Leave this parameter blank.

summary

Summary object

Visualization log summary.

Table 9 Parameter

Parameter

Type

Description

name

String

Parameter name.

value

String

Parameter value.

description

String

Parameter description.

constraint

constraint object

Parameter constraint.

i18n_description

i18n_description object

Internationalization description.

Table 10 constraint

Parameter

Type

Description

type

String

Parameter type.

editable

Boolean

Whether the parameter is editable.

required

Boolean

Whether the parameter is mandatory.

sensitive

Boolean

Whether the parameter is sensitive This function is not implemented currently.

valid_type

String

Valid type.

valid_range

Array of strings

Valid range.

Table 11 i18n_description

Parameter

Type

Description

language

String

International language[. The options are as follows:

  • zh-cn (Chinese)

  • En-us] (tag:hc,hk)

description

String

Description of an international language.

Table 12 policies

Parameter

Type

Description

auto_search

auto_search object

Hyperparameter search configuration.

Table 14 reward_attrs

Parameter

Type

Description

name

String

Metric name.

mode

String

Search mode.

  • max: A larger metric value is preferred.

  • min: A smaller metric value is preferred.

regex

String

Regular expression of a metric.

Table 15 search_params

Parameter

Type

Description

name

String

Hyperparameter name.

param_type

String

Parameter type.

  • continuous: The hyperparameter is of the continuous type. When an algorithm is used in a training job, continuous hyperparameters are displayed as text boxes on the console.

  • discrete: The hyperparameter is of the discrete type. When an algorithm is used in a training job, discrete hyperparameters are displayed as drop-down lists on the console.

lower_bound

String

Lower bound of the hyperparameter.

upper_bound

String

Upper bound of the hyperparameter.

discrete_points_num

String

Number of discrete points of a continuous hyperparameter.

discrete_values

Array of strings

List of discrete hyperparameter values.

Table 16 algo_configs

Parameter

Type

Description

name

String

Name of the search algorithm.

params

Array of AutoSearchAlgoConfigParameter objects

Search algorithm parameters.

Table 17 AutoSearchAlgoConfigParameter

Parameter

Type

Description

key

String

Parameter key.

value

String

Parameter value.

type

String

Parameter type.

Table 18 Input

Parameter

Type

Description

name

String

Name of the data input channel.

description

String

Description of the data input channel.

local_dir

String

Local directory of the container to which the data input channel is mapped Example: /home/ma-user/modelarts/inputs/data_url_0.

remote

InputDataInfo object

Information of the data input. Enums:

  • dataset: The data input is a dataset.

  • obs: The data input is an OBS path.

remote_constraint

Array of remote_constraint objects

Data input constraint

Table 19 InputDataInfo

Parameter

Type

Description

dataset

dataset object

Dataset as the data input.

obs

obs object

OBS in which data input and output stored.

Table 20 dataset

Parameter

Type

Description

id

String

Dataset ID of a training job.

version_id

String

Dataset version ID of a training job.

obs_url

String

OBS URL of the dataset for a training job. It is automatically parsed by ModelArts based on the dataset ID and dataset version ID. For example, /usr/data/.

Table 21 obs

Parameter

Type

Description

obs_url

String

OBS URL of the dataset required by a training job. For example, /usr/data/.

Table 22 remote_constraint

Parameter

Type

Description

data_type

String

Data input type, including the data storage location and dataset.

attributes

String

Attributes if a dataset is used as the data input. Options:

  • data_format: Data format

  • data_segmentation: Data segmentation

  • dataset_type: Labeling type

Table 23 Output

Parameter

Type

Description

name

String

Name of the data output channel.

description

String

Description of the data output channel.

local_dir

String

Local directory of the container to which the data output channel is mapped.

remote

Remote object

Description of the actual data output.

Table 24 JobEngine

Parameter

Type

Description

engine_id

String

Engine ID selected for a training job. The value can be engine_id, engine_name + engine_version, or image_url.

engine_name

String

Name of the engine selected for a training job. If engine_id has been set, you do not need to set this parameter.

engine_version

String

Version of the engine selected for a training job. If engine_id has been set, you do not need to set this parameter.

image_url

String

Custom image URL selected for a training job. The URL is obtained from SWR.

install_sys_packages

Boolean

Whether to install the MoXing version specified by the training platform. Value true means to install the specified MoXing version. This parameter is available only when engine_name, engine_version, and image_url are set.

Table 25 Summary

Parameter

Type

Description

log_type

String

Visualization log type of a training job. After this parameter is configured, the training job can be used as the data source of a visualization job. The options are as follows:

  • tensorboard

  • mindstudio-insight

log_dir

LogDir object

Visualization log output of a training job. This parameter is mandatory when log_type is not empty.

data_sources

Array of DataSource objects

Visualization log input of a visualization job or debug training job. This parameter is mandatory when tensorboard/enable or mindstudio-insight/enable is set to true for advanced training functions.

Table 26 LogDir

Parameter

Type

Description

pfs

PFSSummary object

Output of an OBS parallel file system.

Table 27 PFSSummary

Parameter

Type

Description

pfs_path

String

URL of an OBS parallel file system.

Table 28 DataSource

Parameter

Type

Description

job

JobSummary object

Job data source.

Table 29 JobSummary

Parameter

Type

Description

job_id

String

Training job ID.

Table 30 TaskResponse

Parameter

Type

Description

role

String

Task role. This function is not supported currently.

algorithm

TaskResponseAlgorithm object

Algorithm management and configuration.

task_resource

FlavorResponse object

Flavors of a training job or an algorithm.

Table 31 TaskResponseAlgorithm

Parameter

Type

Description

code_dir

String

Absolute path of the directory where the algorithm boot file is stored.

boot_file

String

Absolute path of the algorithm boot file.

inputs

AlgorithmInput object

Algorithm input channel.

outputs

AlgorithmOutput object

Algorithm output channel.

engine

AlgorithmEngine object

Engine on which a heterogeneous job depends.

local_code_dir

String

Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows:

  • The directory must be under /home.

  • In v1 compatibility mode, the current field does not take effect.

  • When code_dir is prefixed with file://, the current field does not take effect.

working_dir

String

Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode.

Table 32 AlgorithmInput

Parameter

Type

Description

name

String

Name of the data input channel.

local_dir

String

Local path of the container to which the data input and output channels are mapped.

remote

AlgorithmRemote object

Actual data input, which can only be OBS for heterogeneous jobs.

Table 33 AlgorithmRemote

Parameter

Type

Description

obs

RemoteObs object

OBS in which data input and output are stored.

Table 34 AlgorithmOutput

Parameter

Type

Description

name

String

Name of the data output channel.

local_dir

String

Local directory of the container to which the data output channel is mapped.

remote

Remote object

Description of the actual data output.

mode

String

Data transmission mode. The default value is upload_periodically.

period

String

Data transmission period. The default value is 30s.

Table 35 Remote

Parameter

Type

Description

obs

RemoteObs object

OBS to which data is actually exported.

Table 36 RemoteObs

Parameter

Type

Description

obs_url

String

OBS URL to which data is exported.

Table 37 AlgorithmEngine

Parameter

Type

Description

engine_id

String

Engine ID, for example, caffe-1.0.0-python2.7.

engine_name

String

Engine name, for example, Caffe.

engine_version

String

Engine version. Engines with the same name have multiple versions, for example, Caffe-1.0.0-python2.7 of Python 2.7.

v1_compatible

Boolean

Whether the v1 compatibility mode is used.

run_user

String

User UID started by default by the engine.

image_url

String

Custom image URL selected for an algorithm.

Table 38 FlavorResponse

Parameter

Type

Description

flavor_id

String

ID of the resource flavor.

flavor_name

String

Name of the resource flavor.

max_num

Integer

Maximum number of nodes in a resource flavor.

flavor_type

String

Resource flavor type. Options:

  • CPU

  • GPU

  • Ascend

billing

BillingInfo object

Billing information of a resource flavor.

flavor_info

FlavorInfoResponse object

Resource flavor details.

attributes

Map<String,String>

Other specification attributes.

Table 39 FlavorInfoResponse

Parameter

Type

Description

max_num

Integer

Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported.

cpu

Cpu object

CPU specifications.

gpu

Gpu object

GPU specifications.

npu

Npu object

Ascend specifications.

memory

Memory object

Memory information.

disk

DiskResponse object

Disk information.

Table 40 DiskResponse

Parameter

Type

Description

size

Integer

Disk size.

unit

String

Unit of the disk size.

Table 41 SpecResponce

Parameter

Type

Description

resource

Resource object

Resource flavors of a training job. Select either flavor_id or pool_id+[flavor_id].

volumes

Array of JobVolume objects

Volumes attached for a training job.

log_export_path

LogExportPath object

Export path of training job logs.

schedule_policy

SchedulePolicy object

Training job scheduling policy.

custom_metrics

Array of CustomMetrics objects

Metric collection configuration

Table 42 Resource

Parameter

Type

Description

policy

String

Resource specification mode of a training job. The value can be regular, indicating the standard mode.

flavor_id

String

ID of the resource flavor selected for a training job. flavor_id cannot be specified for dedicated resource pools with CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows:

  • modelarts.pool.visual.xlarge (1 card)

  • modelarts.pool.visual.2xlarge (2 cards)

  • modelarts.pool.visual.4xlarge (4 cards)

  • modelarts.pool.visual.8xlarge (8 cards)

flavor_name

String

Read-only flavor name returned by ModelArts when flavor_id is used.

node_count

Integer

Number of resource replicas selected for a training job.

pool_id

String

Resource pool ID selected for a training job.

flavor_detail

FlavorDetail object

Flavor details of a training job or algorithm. This parameter is available only for public resource pools.

Table 43 FlavorDetail

Parameter

Type

Description

flavor_type

String

Resource flavor type. The options are as follows:

  • CPU

  • GPU

  • Ascend

billing

BillingInfo object

Billing information of a resource flavor.

flavor_info

FlavorInfo object

Resource flavor details.

Table 44 BillingInfo

Parameter

Type

Description

code

String

Billing code.

unit_num

Integer

Billing unit.

Table 45 FlavorInfo

Parameter

Type

Description

max_num

Integer

Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported.

cpu

Cpu object

CPU specifications.

gpu

Gpu object

GPU specifications.

npu

Npu object

Ascend specifications.

memory

Memory object

Memory information.

disk

Disk object

Disk information.

Table 46 Cpu

Parameter

Type

Description

arch

String

CPU architecture.

core_num

Integer

Number of cores.

Table 47 Gpu

Parameter

Type

Description

unit_num

Integer

Number of GPUs.

product_name

String

Product name.

memory

String

Memory.

Table 48 Npu

Parameter

Type

Description

unit_num

String

Number of NPUs.

product_name

String

Product name.

memory

String

Memory.

Table 49 Memory

Parameter

Type

Description

size

Integer

Memory size.

unit

String

Number of memory units.

Table 50 Disk

Parameter

Type

Description

size

String

Disk size.

unit

String

Unit of the disk size, which is GB generally.

Table 51 JobVolume

Parameter

Type

Description

nfs

Nfs object

Volumes attached in NFS mode.

Table 52 Nfs

Parameter

Type

Description

nfs_server_path

String

NFS server path, for example, 10.10.10.10:/example/path.

local_path

String

Path for attaching volumes to the training container, for example, /example/path.

read_only

Boolean

Whether the disks attached to the container in NFS mode are read-only.

Table 53 LogExportPath

Parameter

Type

Description

obs_url

String

OBS path for storing training job logs, for example, obs://example/path.

host_path

String

Path of the host where training job logs are stored, for example, /example/path.

Table 54 SchedulePolicy

Parameter

Type

Description

required_affinity

RequiredAffinity object

Affinity requirements for training jobs.

priority

Integer

Priority of the training job.

preemptible

Boolean

Whether preemption is allowed

Table 55 RequiredAffinity

Parameter

Type

Description

affinity_type

String

Affinity scheduling policy. Possible values are as follows:

  • cabinet: strong cabinet scheduling

  • hyperinstance: supernode affinity scheduling

affinity_group_size

Integer

Affinity group size. This parameter is mandatory when affinity_type is set to hyperinstance. In this case, the system schedules tasks specified by affinity_group_size to a supernode to form an affinity group.

When a user delivers a training job to the supernode resource pool, if the affinity group size is not set, the system sets the value to 1 by default.

Table 56 CustomMetrics

Parameter

Type

Description

metrics_url

String

URL for collecting metrics. Either configure all ports or leave all ports blank.

metrics_port

Integer

Port for collecting metrics. Either configure all ports or leave all ports blank.

Table 57 JobEndpointsResp

Parameter

Type

Description

ssh

SSHResp object

SSH connection information.

jupyter_lab

JupyterLab object

JupyterLab connection information.

tensorboard

Tensorboard object

TensorBoard connection information.

mindstudio_insight

MindStudioInsight object

MindStudio Insight connection information.

Table 58 SSHResp

Parameter

Type

Description

key_pair_names

Array of strings

Specifies the SSH key pair name, which can be created and viewed on the Key Pair page of the ECS console.

task_urls

Array of TaskUrls objects

SSH connection address information.

Table 59 TaskUrls

Parameter

Type

Description

task

String

ID of a training job.

url

String

SSH connection address of a training job.

Table 60 JupyterLab

Parameter

Type

Description

url

String

JupyterLab address of a training job.

token

String

JupyterLab token of a training job.

Table 61 Tensorboard

Parameter

Type

Description

url

String

TensorBoard URL of a training job.

token

String

TensorBoard token of a training job

Table 62 MindStudioInsight

Parameter

Type

Description

url

String

MindStudio Insight URL of a training job.

token

String

MindStudio Insight token of a training job.

Example Requests

The following is an example of how to stop the training job whose UUID is 3faf5c03-aaa1-4cbe-879d-24b05d997347.

POST https://endpoint/v2/{project_id}/training-jobs/cf63aba9-63b1-4219-b717-708a2665100b/actions

{
  "action_type" : "terminate"
}

Example Responses

Status code: 202

ok

{
  "kind" : "job",
  "metadata" : {
    "id" : "cf63aba9-63b1-4219-b717-708a2665100b",
    "name" : "trainjob--py14_mem06-110",
    "description" : "",
    "create_time" : 1636515222282,
    "workspace_id" : "0",
    "user_name" : "ei_modelarts_z00424192_01"
  },
  "status" : {
    "phase" : "Terminating",
    "secondary_phase" : "Terminating",
    "duration" : 0,
    "start_time" : 0,
    "node_count_metrics" : null,
    "tasks" : [ "worker-0" ]
  },
  "algorithm" : {
    "code_dir" : "obs://test/economic_test/py_minist/",
    "boot_file" : "obs://test/economic_test/py_minist/minist_common.py",
    "inputs" : [ {
      "name" : "data_url",
      "local_dir" : "/home/ma-user/modelarts/inputs/data_url_0",
      "remote" : {
        "obs" : {
          "obs_url" : "/test/data/py_minist/"
        }
      }
    } ],
    "outputs" : [ {
      "name" : "train_url",
      "local_dir" : "/home/ma-user/modelarts/outputs/train_url_0",
      "remote" : {
        "obs" : {
          "obs_url" : "/test/train_output/"
        }
      }
    } ],
    "engine" : {
      "engine_id" : "pytorch-cp36-1.4.0-v2",
      "engine_name" : "PyTorch",
      "engine_version" : "PyTorch-1.4.0-python3.6-v2"
    }
  },
  "spec" : {
    "resource" : {
      "policy" : "economic",
      "flavor_id" : "modelarts.vm.pnt1.large.eco",
      "flavor_name" : "Computing GPU(Pnt1) instance",
      "node_count" : 1,
      "flavor_detail" : {
        "flavor_type" : "GPU",
        "billing" : {
          "code" : "modelarts.vm.gpu.pnt1.eco",
          "unit_num" : 1
        },
        "flavor_info" : {
          "cpu" : {
            "arch" : "x86",
            "core_num" : 8
          },
          "gpu" : {
            "unit_num" : 1,
            "product_name" : "GP-Pnt1",
            "memory" : "8GB"
          },
          "memory" : {
            "size" : 64,
            "unit" : "GB"
          }
        }
      }
    }
  }
}

Status Codes

Status Code

Description

202

ok

Error Codes

See Error Codes.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback