Terminating a Training Job
Terminate a training job. Only jobs in the creating, awaiting, or running state can be terminated.
Sample Code
In ModelArts notebook, you do not need to enter authentication parameters for session authentication. For details about session authentication of other development environments, see Session Authentication.
- Method 1: Use the specified job_id.
from modelarts.session import Session from modelarts.estimatorV2 import Estimator session = Session() info = Estimator.control_job_by_id(session=session, job_id="your job id") print(info)
- Method 2: Use the training job created in Creating a Training Job.
job_instance.control_job()
Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
session |
Yes |
Object |
Session object. For details about the initialization method, see Session Authentication. |
job_id |
Yes |
String |
ID of a training job. You can obtain job_id using the training job created in Creating a Training Job, for example, job_instance.job_id, or from the response obtained in Obtaining Training Jobs. |
Parameter |
Type |
Description |
---|---|---|
kind |
String |
Training job type, which defaults to job. Options:
|
metadata |
JobMetadata object |
Metadata of a training job. |
status |
Status object |
Status of a training job. When creating a training job, you do not need to set this parameter. |
algorithm |
JobAlgorithmResponse object |
Algorithm used by a training job. The following formats are supported:
|
tasks |
Array of TaskResponse objects |
Tasks of a heterogeneous training job. |
spec |
spec object |
Specifications of a training job. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Training job ID, which is generated and returned by ModelArts after a training job is created. |
name |
String |
Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). |
workspace_id |
String |
Workspace where a training job is deployed. Default value: 0 |
description |
String |
Description of a training job, which defaults to NULL. The value must contain 0 to 256 characters. |
create_time |
Long |
Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created. |
user_name |
String |
Username for creating a training job. The username is generated and returned by ModelArts after a training job is created. |
annotations |
Map<String,String> |
Declaration template of a training job. For heterogeneous jobs, the default value of job_template is Template RL. For other jobs, the default value is Template DL. |
Parameter |
Type |
Description |
---|---|---|
phase |
String |
Level-1 status of a training job. The value will remain unchanged. Options: Creating, Pending, Running, Failed, Completed, Terminating, Terminated, and Abnormal |
secondary_phase |
String |
Level-2 status of a training job. The value can be changed. Options: Creating, Queuing, Running, Failed, Completed, Terminating, Terminated, CreateFailed, TerminatedFailed, Unknown, and Lost |
duration |
Long |
Running duration of a training job, in milliseconds |
node_count_metrics |
Array<Array<Integer>> |
Node count changes during the runtime of a training job |
tasks |
Array of strings |
Task of a training job |
start_time |
String |
Start time of a training job. The value is in timestamp format. |
task_statuses |
Array of objects |
Status of a training job task |
Parameter |
Type |
Description |
---|---|---|
task |
String |
Task of a training job |
exit_code |
Integer |
Exit code of a training job task |
message |
String |
Error message of a training job task |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Algorithm ID Options:
|
name |
String |
Algorithm name |
subscription_id |
String |
Subscription ID of the subscribed algorithm, which must be used with item_version_id |
item_version_id |
String |
Version ID of the subscribed algorithm, which must be used with subscription_id |
code_dir |
String |
Code directory of a training job, for example, /usr/app/. This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified. |
boot_file |
String |
Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified. |
autosearch_config_path |
String |
YAML configuration path of an auto search job. An OBS URL is required. |
autosearch_framework_path |
String |
Framework code directory of an auto search job. An OBS URL is required. |
command |
String |
Boot command for starting the container of the custom image used for creating a training job. The value of this parameter can be the same as the code_dir value. |
parameters |
Array of Parameter objects |
Running parameters of a training job. |
policies |
policies object |
Policies supported by a training job. |
inputs |
Array of Input objects |
Input of a training job. |
outputs |
Array of Output objects |
Output of a training job. |
engine |
engine object |
Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id and item_version_id of the subscribed algorithm. |
environments |
Map<String,String> |
Environment variables of a training job in the format of "key":"value". Leave this parameter blank. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Parameter name |
value |
String |
Parameter value |
description |
String |
Parameter description |
constraint |
constraint object |
Parameter constraint |
i18n_description |
i18n_description object |
Internationalization description |
Parameter |
Type |
Description |
---|---|---|
type |
String |
Parameter type |
editable |
Boolean |
Whether the parameter is editable |
required |
Boolean |
Whether the parameter is mandatory |
sensitive |
Boolean |
Whether the parameter is sensitive |
valid_type |
String |
Valid type |
valid_range |
Array of strings |
Valid range |
Parameter |
Type |
Description |
---|---|---|
language |
String |
Internationalization language |
description |
String |
Description |
Parameter |
Type |
Description |
---|---|---|
auto_search |
auto_search object |
Hyperparameter search configuration |
Parameter |
Type |
Description |
---|---|---|
skip_search_params |
String |
Hyperparameter parameters that need to be skipped |
reward_attrs |
Array of objects |
Search metrics |
search_params |
Array of objects |
Search parameters |
algo_configs |
Array of objects |
Search algorithm configurations |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Metric name |
mode |
String |
Search mode
|
regex |
String |
Regular expression of a metric |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Hyperparameter name |
param_type |
String |
Parameter type
|
lower_bound |
String |
Lower bound of the hyperparameter |
upper_bound |
String |
Upper bound of the hyperparameter |
discrete_points_num |
String |
Number of discrete points of a hyperparameter with continuous values |
discrete_values |
Array of strings |
Discrete hyperparameter values |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the search algorithm |
params |
Array of AutoSearchAlgoConfigParameter objects |
Search algorithm parameters |
Parameter |
Type |
Description |
---|---|---|
key |
String |
Parameter key |
value |
String |
Parameter value |
type |
String |
Parameter type |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the data input channel |
description |
String |
Description of the data input channel |
local_dir |
String |
Local directory of the container to which the data input channel is mapped |
remote |
InputDataInfo object |
Information of the data input |
remote_constraint |
Array of objects |
Data input constraint |
Parameter |
Type |
Description |
---|---|---|
dataset |
dataset object |
Dataset as the data input |
obs |
obs object |
OBS in which data input and output are stored |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Dataset ID of a training job |
version_id |
String |
Dataset version ID of a training job |
obs_url |
String |
OBS URL of the dataset for a training job, which is automatically parsed by ModelArts based on the dataset ID and dataset version IDs, for example, /usr/data/ |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL of the dataset for a training job, for example, /usr/data/ |
Parameter |
Type |
Description |
---|---|---|
data_type |
String |
Data input type, including the data storage location and dataset |
attributes |
String |
Attributes when a dataset functions as the data input Options:
|
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the data output channel |
description |
String |
Description of the data output channel |
local_dir |
String |
Local directory of the container to which the data output channel is mapped |
remote |
remote object |
Information of the data output |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL to which data is exported |
Parameter |
Type |
Description |
---|---|---|
engine_id |
String |
Engine ID selected for a training job, which can be engine_id, engine_name and engine_version, or image_url |
engine_name |
String |
Name of the engine selected for a training job. Leave this parameter blank if engine_id is specified. |
engine_version |
String |
Version of the engine selected for a training job. Leave this parameter blank if engine_id is specified. |
image_url |
String |
Custom image URL selected for a training job |
Parameter |
Type |
Description |
---|---|---|
role |
String |
Role of a heterogeneous training job task Options:
|
algorithm |
algorithm object |
Algorithm configurations in algorithm management |
task_resource |
FlavorResponse object |
Flavors for a training job or an algorithm |
Parameter |
Type |
Description |
---|---|---|
code_dir |
String |
Absolute path of the directory where the algorithm boot file is stored |
boot_file |
String |
Absolute path of the algorithm boot file |
inputs |
inputs object |
Algorithm input channel |
outputs |
outputs object |
Algorithm output channel |
engine |
engine object |
Engine on which a heterogeneous job depends |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the data input channel |
local_dir |
String |
Local path of the container to which the data input and output channels are mapped |
remote |
remote object |
Actual data input, which can only be OBS for heterogeneous jobs |
Parameter |
Type |
Description |
---|---|---|
obs |
obs object |
OBS in which data input and output are stored |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL of the dataset for a training job, for example, /usr/data/ |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the data output channel |
local_dir |
String |
Local directory of the container to which the data output channel is mapped |
remote |
remote object |
Information of the data output |
mode |
String |
Data transmission mode, which defaults to upload_periodically |
period |
String |
Data transmission period, which defaults to 30s |
Parameter |
Type |
Description |
---|---|---|
obs |
obs object |
OBS to which data is exported |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL to which data is exported |
Parameter |
Type |
Description |
---|---|---|
engine_id |
String |
Engine ID of a heterogeneous job, for example, caffe-1.0.0-python2.7 |
engine_name |
String |
Engine name of a heterogeneous job, for example, Caffe |
engine_version |
String |
Engine version of a heterogeneous job |
v1_compatible |
Boolean |
Whether v1 is compatible |
run_user |
String |
User UID for which the engine is started by default |
Parameter |
Type |
Description |
---|---|---|
flavor_id |
String |
ID of the resource flavor |
flavor_name |
String |
Name of the resource flavor |
max_num |
Integer |
Maximum number of nodes with the resource flavor |
flavor_type |
String |
Resource flavor type. Options:
|
billing |
billing object |
Billing information of a resource flavor |
flavor_info |
flavor_info object |
Resource flavor details |
attributes |
Map<String,String> |
Other flavor attributes |
Parameter |
Type |
Description |
---|---|---|
code |
String |
Billing code |
unit_num |
Integer |
Number of billing units |
Parameter |
Type |
Description |
---|---|---|
max_num |
Integer |
Maximum number of nodes that can be selected. Value 1 indicates that the distributed mode is not supported. |
cpu |
cpu object |
CPU specifications |
gpu |
gpu object |
GPU specifications |
npu |
npu object |
Ascend specifications |
memory |
memory object |
Memory information |
Parameter |
Type |
Description |
---|---|---|
arch |
String |
CPU architecture |
core_num |
Integer |
Number of cores |
Parameter |
Type |
Description |
---|---|---|
unit_num |
Integer |
Number of GPUs |
product_name |
String |
Product name |
memory |
String |
Memory |
Parameter |
Type |
Description |
---|---|---|
unit_num |
String |
Number of NPUs |
product_name |
String |
Product name |
memory |
String |
Memory |
Parameter |
Type |
Description |
---|---|---|
size |
Integer |
Memory size |
unit |
String |
Number of memory units |
Parameter |
Type |
Description |
---|---|---|
resource |
Resource object |
Resource flavors of a training job, which can either be flavor_id or pool_id and flavor_id |
volumes |
Array of objects |
Volumes attached for a training job |
log_export_path |
log_export_path object |
Export path of training job logs |
Parameter |
Type |
Description |
---|---|---|
policy |
String |
Resource flavor mode of a training job. Options: regular, economic, and turbo |
flavor_id |
String |
Resource flavor ID of a training job |
flavor_name |
String |
Read-only flavor name returned by ModelArts when flavor_id is specified |
node_count |
Integer |
Number of resource replicas selected for a training job Minimum value: 1 |
pool_id |
String |
Resource pool ID selected for a training job |
flavor_detail |
flavor_detail object |
Flavors for a training job or an algorithm |
Parameter |
Type |
Description |
---|---|---|
flavor_type |
String |
Resource flavor type. Options:
|
billing |
billing object |
Billing information of a resource flavor |
flavor_info |
flavor_info object |
Resource flavor details |
Parameter |
Type |
Description |
---|---|---|
code |
String |
Billing code |
unit_num |
Integer |
Number of billing units |
Parameter |
Type |
Description |
---|---|---|
max_num |
Integer |
Maximum number of nodes that can be selected. Value 1 indicates that the distributed mode is not supported. |
cpu |
cpu object |
CPU specifications |
gpu |
gpu object |
GPU specifications |
npu |
npu object |
Ascend specifications |
memory |
memory object |
Memory information |
disk |
disk object |
Disk information |
Parameter |
Type |
Description |
---|---|---|
arch |
String |
CPU architecture |
core_num |
Integer |
Number of cores |
Parameter |
Type |
Description |
---|---|---|
unit_num |
Integer |
Number of GPUs |
product_name |
String |
Product name |
memory |
String |
Memory |
Parameter |
Type |
Description |
---|---|---|
unit_num |
String |
Number of NPUs |
product_name |
String |
Product name |
memory |
String |
Memory |
Parameter |
Type |
Description |
---|---|---|
size |
Integer |
Memory size |
unit |
String |
Number of memory units |
Parameter |
Type |
Description |
---|---|---|
size |
String |
Disk size |
unit |
String |
Unit of the disk size, which is GB generally |
Parameter |
Type |
Description |
---|---|---|
nfs |
nfs object |
Disks attached in NFS mode |
Parameter |
Type |
Description |
---|---|---|
nfs_server_path |
String |
NFS server path |
local_path |
String |
Path for attaching disks to the training container |
read_only |
Boolean |
Whether the disks attached to the container in NFS mode are read-only |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL for storing training job logs |
host_path |
String |
Path of the host where training job logs are stored |
Parameter |
Type |
Description |
---|---|---|
error_msg |
String |
Error message when calling an API failed. This parameter is unavailable if an API is successfully called. |
error_code |
String |
Error code when calling an API failed. For details, see "Error Codes" in ModelArts API Reference. This parameter is unavailable if an API is successfully called. |
error_solution |
String |
Solution to an API calling failure. This parameter is unavailable if an API is successfully called. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.