Help Center/ ModelArts/ SDK Reference/ Training Management (New Version)/ Training Jobs/ Terminating a Training Job

Updated on 2023-05-09 GMT+08:00

View PDF

Terminating a Training Job

Terminate a training job. Only jobs in the creating, awaiting, or running state can be terminated.

Sample Code

In ModelArts notebook, you do not need to enter authentication parameters for session authentication. For details about session authentication of other development environments, see Session Authentication.

Method 1: Use the specified job_id.

from modelarts.session import Session
from modelarts.estimatorV2 import Estimator
session = Session()
info = Estimator.control_job_by_id(session=session, job_id="your job id")
print(info)

Method 2: Use the training job created in Creating a Training Job.
```
job_instance.control_job()
```

Parameters

**Table 1** **control_job_by_id** request parameters
Parameter	Mandatory	Type	Description
session	Yes	Object	Session object. For details about the initialization method, see Session Authentication.
job_id	Yes	String	ID of a training job. You can obtain job_id using the training job created in Creating a Training Job, for example, job_instance.job_id, or from the response obtained in Obtaining Training Jobs.

**Table 2** Response parameters
Parameter	Type	Description
kind	String	Training job type, which defaults to job. Options: job: training job hetero_job: heterogeneous job autosearch_job: auto search job mrs_job: MRS job edge_job: edge job
metadata	JobMetadata object	Metadata of a training job.
status	Status object	Status of a training job. When creating a training job, you do not need to set this parameter.
algorithm	JobAlgorithmResponse object	Algorithm used by a training job. The following formats are supported: id: Only the algorithm ID is used. subscription_id and item_version_id: The subscription ID and version ID of the algorithm are used. code_dir and boot_file: The code directory and boot file of the training job are used.
tasks	Array of TaskResponse objects	Tasks of a heterogeneous training job.
spec	spec object	Specifications of a training job.

**Table 3** JobMetadata
Parameter	Type	Description
id	String	Training job ID, which is generated and returned by ModelArts after a training job is created.
name	String	Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-).
workspace_id	String	Workspace where a training job is deployed. Default value: 0
description	String	Description of a training job, which defaults to NULL. The value must contain 0 to 256 characters.
create_time	Long	Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created.
user_name	String	Username for creating a training job. The username is generated and returned by ModelArts after a training job is created.
annotations	Map<String,String>	Declaration template of a training job. For heterogeneous jobs, the default value of job_template is Template RL. For other jobs, the default value is Template DL.

**Table 4** Status
Parameter	Type	Description
phase	String	Level-1 status of a training job. The value will remain unchanged. Options: Creating, Pending, Running, Failed, Completed, Terminating, Terminated, and Abnormal
secondary_phase	String	Level-2 status of a training job. The value can be changed. Options: Creating, Queuing, Running, Failed, Completed, Terminating, Terminated, CreateFailed, TerminatedFailed, Unknown, and Lost
duration	Long	Running duration of a training job, in milliseconds
node_count_metrics	Array<Array<Integer>>	Node count changes during the runtime of a training job
tasks	Array of strings	Task of a training job
start_time	String	Start time of a training job. The value is in timestamp format.
task_statuses	Array of objects	Status of a training job task

**Table 5** task_statuses
Parameter	Type	Description
task	String	Task of a training job
exit_code	Integer	Exit code of a training job task
message	String	Error message of a training job task

**Table 6** JobAlgorithmResponse
Parameter	Type	Description
id	String	Algorithm ID Options: id: Only the algorithm ID is used. subscription_id and item_version_id: The subscription ID and version ID of the algorithm are used. code_dir and boot_file: The code directory and boot file of the training job are used.
name	String	Algorithm name
subscription_id	String	Subscription ID of the subscribed algorithm, which must be used with item_version_id
item_version_id	String	Version ID of the subscribed algorithm, which must be used with subscription_id
code_dir	String	Code directory of a training job, for example, /usr/app/. This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified.
boot_file	String	Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified.
autosearch_config_path	String	YAML configuration path of an auto search job. An OBS URL is required.
autosearch_framework_path	String	Framework code directory of an auto search job. An OBS URL is required.
command	String	Boot command for starting the container of the custom image used for creating a training job. The value of this parameter can be the same as the code_dir value.
parameters	Array of Parameter objects	Running parameters of a training job.
policies	policies object	Policies supported by a training job.
inputs	Array of Input objects	Input of a training job.
outputs	Array of Output objects	Output of a training job.
engine	engine object	Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id and item_version_id of the subscribed algorithm.
environments	Map<String,String>	Environment variables of a training job in the format of "key":"value". Leave this parameter blank.

**Table 7** Parameter
Parameter	Type	Description
name	String	Parameter name
value	String	Parameter value
description	String	Parameter description
constraint	constraint object	Parameter constraint
i18n_description	i18n_description object	Internationalization description

**Table 8** constraint
Parameter	Type	Description
type	String	Parameter type
editable	Boolean	Whether the parameter is editable
required	Boolean	Whether the parameter is mandatory
sensitive	Boolean	Whether the parameter is sensitive
valid_type	String	Valid type
valid_range	Array of strings	Valid range

**Table 9** i18n_description
Parameter	Type	Description
language	String	Internationalization language
description	String	Description

**Table 10** policies
Parameter	Type	Description
auto_search	auto_search object	Hyperparameter search configuration

**Table 11** auto_search
Parameter	Type	Description
skip_search_params	String	Hyperparameter parameters that need to be skipped
reward_attrs	Array of objects	Search metrics
search_params	Array of objects	Search parameters
algo_configs	Array of objects	Search algorithm configurations

**Table 12** reward_attrs
Parameter	Type	Description
name	String	Metric name
mode	String	Search mode max: A larger metric value is preferred. min: A smaller metric value is preferred.
regex	String	Regular expression of a metric

**Table 13** search_params
Parameter	Type	Description
name	String	Hyperparameter name
param_type	String	Parameter type continuous: Parameter values are continuous. discrete: Parameter values are discrete.
lower_bound	String	Lower bound of the hyperparameter
upper_bound	String	Upper bound of the hyperparameter
discrete_points_num	String	Number of discrete points of a hyperparameter with continuous values
discrete_values	Array of strings	Discrete hyperparameter values

**Table 14** algo_configs
Parameter	Type	Description
name	String	Name of the search algorithm
params	Array of AutoSearchAlgoConfigParameter objects	Search algorithm parameters

**Table 15** AutoSearchAlgoConfigParameter
Parameter	Type	Description
key	String	Parameter key
value	String	Parameter value
type	String	Parameter type

**Table 16** Input
Parameter	Type	Description
name	String	Name of the data input channel
description	String	Description of the data input channel
local_dir	String	Local directory of the container to which the data input channel is mapped
remote	InputDataInfo object	Information of the data input
remote_constraint	Array of objects	Data input constraint

**Table 17** InputDataInfo
Parameter	Type	Description
dataset	dataset object	Dataset as the data input
obs	obs object	OBS in which data input and output are stored

**Table 18** dataset
Parameter	Type	Description
id	String	Dataset ID of a training job
version_id	String	Dataset version ID of a training job
obs_url	String	OBS URL of the dataset for a training job, which is automatically parsed by ModelArts based on the dataset ID and dataset version IDs, for example, /usr/data/

**Table 19** obs
Parameter	Type	Description
obs_url	String	OBS URL of the dataset for a training job, for example, /usr/data/

**Table 20** remote_constraint
Parameter	Type	Description
data_type	String	Data input type, including the data storage location and dataset
attributes	String	Attributes when a dataset functions as the data input Options: data_format: data format data_segmentation: data segmentation dataset_type: data labeling

**Table 21** Output
Parameter	Type	Description
name	String	Name of the data output channel
description	String	Description of the data output channel
local_dir	String	Local directory of the container to which the data output channel is mapped
remote	remote object	Information of the data output

**Table 22** remote
Parameter	Type	Description
obs	obs object	OBS to which data is exported

**Table 23** obs
Parameter	Type	Description
obs_url	String	OBS URL to which data is exported

**Table 24** engine
Parameter	Type	Description
engine_id	String	Engine ID selected for a training job, which can be engine_id, engine_name and engine_version, or image_url
engine_name	String	Name of the engine selected for a training job. Leave this parameter blank if engine_id is specified.
engine_version	String	Version of the engine selected for a training job. Leave this parameter blank if engine_id is specified.
image_url	String	Custom image URL selected for a training job

**Table 25** TaskResponse
Parameter	Type	Description
role	String	Role of a heterogeneous training job task Options: learner: GPUs or CPUs are supported. worker: CPUs are supported.
algorithm	algorithm object	Algorithm configurations in algorithm management
task_resource	FlavorResponse object	Flavors for a training job or an algorithm

**Table 26** algorithm
Parameter	Type	Description
code_dir	String	Absolute path of the directory where the algorithm boot file is stored
boot_file	String	Absolute path of the algorithm boot file
inputs	inputs object	Algorithm input channel
outputs	outputs object	Algorithm output channel
engine	engine object	Engine on which a heterogeneous job depends

**Table 27** inputs
Parameter	Type	Description
name	String	Name of the data input channel
local_dir	String	Local path of the container to which the data input and output channels are mapped
remote	remote object	Actual data input, which can only be OBS for heterogeneous jobs

**Table 28** remote
Parameter	Type	Description
obs	obs object	OBS in which data input and output are stored

**Table 29** obs
Parameter	Type	Description
obs_url	String	OBS URL of the dataset for a training job, for example, /usr/data/

**Table 30** outputs
Parameter	Type	Description
name	String	Name of the data output channel
local_dir	String	Local directory of the container to which the data output channel is mapped
remote	remote object	Information of the data output
mode	String	Data transmission mode, which defaults to upload_periodically
period	String	Data transmission period, which defaults to 30s

**Table 31** remote
Parameter	Type	Description
obs	obs object	OBS to which data is exported

**Table 32** obs
Parameter	Type	Description
obs_url	String	OBS URL to which data is exported

**Table 33** engine
Parameter	Type	Description
engine_id	String	Engine ID of a heterogeneous job, for example, caffe-1.0.0-python2.7
engine_name	String	Engine name of a heterogeneous job, for example, Caffe
engine_version	String	Engine version of a heterogeneous job
v1_compatible	Boolean	Whether v1 is compatible
run_user	String	User UID for which the engine is started by default

**Table 34** FlavorResponse
Parameter	Type	Description
flavor_id	String	ID of the resource flavor
flavor_name	String	Name of the resource flavor
max_num	Integer	Maximum number of nodes with the resource flavor
flavor_type	String	Resource flavor type. Options: CPU GPU Ascend
billing	billing object	Billing information of a resource flavor
flavor_info	flavor_info object	Resource flavor details
attributes	Map<String,String>	Other flavor attributes

**Table 35** billing
Parameter	Type	Description
code	String	Billing code
unit_num	Integer	Number of billing units

**Table 36** flavor_info
Parameter	Type	Description
max_num	Integer	Maximum number of nodes that can be selected. Value 1 indicates that the distributed mode is not supported.
cpu	cpu object	CPU specifications
gpu	gpu object	GPU specifications
npu	npu object	Ascend specifications
memory	memory object	Memory information

**Table 37** cpu
Parameter	Type	Description
arch	String	CPU architecture
core_num	Integer	Number of cores

**Table 38** gpu
Parameter	Type	Description
unit_num	Integer	Number of GPUs
product_name	String	Product name
memory	String	Memory

**Table 39** npu
Parameter	Type	Description
unit_num	String	Number of NPUs
product_name	String	Product name
memory	String	Memory

**Table 40** memory
Parameter	Type	Description
size	Integer	Memory size
unit	String	Number of memory units

**Table 41** spec
Parameter	Type	Description
resource	Resource object	Resource flavors of a training job, which can either be flavor_id or pool_id and flavor_id
volumes	Array of objects	Volumes attached for a training job
log_export_path	log_export_path object	Export path of training job logs

**Table 42** Resource
Parameter	Type	Description
policy	String	Resource flavor mode of a training job. Options: regular, economic, and turbo
flavor_id	String	Resource flavor ID of a training job
flavor_name	String	Read-only flavor name returned by ModelArts when flavor_id is specified
node_count	Integer	Number of resource replicas selected for a training job Minimum value: 1
pool_id	String	Resource pool ID selected for a training job
flavor_detail	flavor_detail object	Flavors for a training job or an algorithm

**Table 43** flavor_detail
Parameter	Type	Description
flavor_type	String	Resource flavor type. Options: CPU GPU Ascend
billing	billing object	Billing information of a resource flavor
flavor_info	flavor_info object	Resource flavor details

**Table 44** billing
Parameter	Type	Description
code	String	Billing code
unit_num	Integer	Number of billing units

**Table 45** flavor_info
Parameter	Type	Description
max_num	Integer	Maximum number of nodes that can be selected. Value 1 indicates that the distributed mode is not supported.
cpu	cpu object	CPU specifications
gpu	gpu object	GPU specifications
npu	npu object	Ascend specifications
memory	memory object	Memory information
disk	disk object	Disk information

**Table 46** cpu
Parameter	Type	Description
arch	String	CPU architecture
core_num	Integer	Number of cores

**Table 47** gpu
Parameter	Type	Description
unit_num	Integer	Number of GPUs
product_name	String	Product name
memory	String	Memory

**Table 48** npu
Parameter	Type	Description
unit_num	String	Number of NPUs
product_name	String	Product name
memory	String	Memory

**Table 49** memory
Parameter	Type	Description
size	Integer	Memory size
unit	String	Number of memory units

**Table 50** disk
Parameter	Type	Description
size	String	Disk size
unit	String	Unit of the disk size, which is GB generally

**Table 51** volumes
Parameter	Type	Description
nfs	nfs object	Disks attached in NFS mode

**Table 52** nfs
Parameter	Type	Description
nfs_server_path	String	NFS server path
local_path	String	Path for attaching disks to the training container
read_only	Boolean	Whether the disks attached to the container in NFS mode are read-only

**Table 53** log_export_path
Parameter	Type	Description
obs_url	String	OBS URL for storing training job logs
host_path	String	Path of the host where training job logs are stored

**Table 54** Response for the failure to call a training API
Parameter	Type	Description
error_msg	String	Error message when calling an API failed. This parameter is unavailable if an API is successfully called.
error_code	String	Error code when calling an API failed. For details, see Error Codes. This parameter is unavailable if an API is successfully called.
error_solution	String	Solution to an API calling failure. This parameter is unavailable if an API is successfully called.