Help Center/ ModelArts/ API Reference/ Node Management/ Migrating Nodes in Batches
Updated on 2025-11-19 GMT+08:00

Migrating Nodes in Batches

Function

This API is used to migrate nodes in batches between clusters in a resource pool. If a resource pool has only one node, the migration is not supported.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/pools/{pool_name}/nodes/batch-migrate

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition: User project ID. For details, see Obtaining a Project ID and Name.

Constraints: N/A

Range: N/A

Default Value: N/A

pool_name

Yes

String

Resource pool name. The value is the same as the metadata.name value of the resource pool.

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

migratenodenames

No

Array of strings

Names of the nodes to be migrated.

fromclustername

No

String

Name of the source cluster.

toclustername

No

String

Name of the destination cluster.

topoolname

No

String

Name of the target resource pool.

resourcespec

No

MigrateResourceSpec object

Configurations of the node to be migrated in the target resource pool. This parameter is mandatory when a node is migrated across resource pools.

Table 3 MigrateResourceSpec

Parameter

Mandatory

Type

Description

flavor

Yes

String

Resource flavor name. This parameter is mandatory for cross-resource-pool migration.

creatingstep

No

CreatingStep object

Resource step information. When the entire cabinet or supernode is migrated, the step information needs to be specified.

nodepool

No

String

Name of the target node pool to which resources are migrated.

rootvolume

No

RootVolume object

System disk information of the target node pool. This parameter is valid only when a node pool is created.

datavolumes

No

Array of DataVolumeItem objects

Data disk information of the target node pool. This parameter is valid only when a node pool is created.

volumegroupconfigs

No

Array of VolumeGroupConfig objects

Advanced disk configurations. This parameter is mandatory when a custom data disk exists. This parameter is valid when a node pool is created.

labels

No

Map<String,String>

Kubernetes label, in the format of a key-value pair. This parameter cannot be specified for a non-privileged pool. This parameter is valid when a node pool is created.

taints

No

Array of Taints objects

Taints to be added to nodes to set anti-affinity. This parameter cannot be specified for a non-privileged pool. This parameter is valid when a node pool is created.

tags

No

Array of UserTags objects

Resource tag. This parameter is valid when a node pool is created.

network

No

NodeNetwork object

Network configuration. This parameter cannot be specified for a non-privileged pool. This parameter is valid when a node pool is created.

extendparams

No

ResourceExtendParams object

Custom configuration, for example, setting dockerSize for the node. This parameter is valid when a node pool is created.

Table 4 CreatingStep

Parameter

Mandatory

Type

Description

step

No

Integer

Definition: Step of a supernode.

Constraints: N/A

Range: Only the step contained in the resource specification details is supported.

Default Value: N/A

type

No

String

Definition: Batch creation type.

Constraints: N/A

Range:

  • hyperinstance: supernode

Default Value: N/A

Table 5 RootVolume

Parameter

Mandatory

Type

Description

volumetype

No

String

Disk type. For details, see "Disk Types and Performance". Options:

  • SSD: ultra-high I/O disk

  • GPSSD: general-purpose SSD

size

Yes

String

e

Disk size, in GiB.

Table 6 DataVolumeItem

Parameter

Mandatory

Type

Description

volumetype

No

String

Disk type. Options:

  • SSD: ultra-high I/O disk

  • GPSSD: general-purpose SSD

  • SAS: high I/O disk

size

Yes

String

Disk size, in GiB.

count

No

Integer

Number of disks.

extendparams

No

VolumeExtendParams object

Custom disk configuration.

Table 7 VolumeExtendParams

Parameter

Mandatory

Type

Description

volumegroup

No

String

Name of a disk group, which is used to divide storage space. Options:

  • vgpaas: container disk.

  • default: common data disk, which is mounted in default mode.

  • vguser{num}: common data disk, which is mounted to a specified path. The group name varies depending on the path, for example, vguser1 and vguser2.

  • vg-everest-localvolume-persistent: common data disk, which is used as the persistent storage volume.

  • vg-everest-localvolume-ephemeral: common data disk, which is used as a temporary storage volume.

Table 8 VolumeGroupConfig

Parameter

Mandatory

Type

Description

volumegroup

No

String

Disk group name. Index of the volume group in the dataVolumes.

dockerthinpool

No

Integer

Percentage of container disks to data disks on nodes in a resource pool. This parameter can be specified only when volumeGroup is vgpaas (container disk).

lvmconfig

No

LvmConfig object

LVM configuration management.

types

No

Array of strings

Storage type. Options:

  • volume: cloud hard disk. When dataVolumes is specified, the default value is used.

  • local: local disk. This parameter must be specified when a local disk is used.

Table 9 LvmConfig

Parameter

Mandatory

Type

Description

lvtype

No

String

LVM write mode. Options:

  • linear: linear mode.

  • striped: striped mode in which multiple disks are used to form a strip to improve disk performance.

path

No

String

Disk mount path. This parameter takes effect only in user configuration. The value is an absolute path. Digits, letters, periods (.), hyphens (-), and underscores (_) are allowed.

Table 10 Taints

Parameter

Mandatory

Type

Description

key

Yes

String

Definition: Key.

Range: N/A

value

No

String

Value.

effect

Yes

String

Effect.

Table 11 UserTags

Parameter

Mandatory

Type

Description

key

Yes

String

Definition: Key. The value cannot start with CCE- or __type_baremetal.

Range: N/A

value

Yes

String

Value.

Table 12 NodeNetwork

Parameter

Mandatory

Type

Description

vpc

No

String

  • Definition: VPC ID.

Constraints: N/A

Range: N/A

Default Value: N/A

subnet

No

String

Definition: Subnet ID.

Constraints: N/A

Range: N/A

Default Value: N/A

securityGroups

No

Array of strings

Definition: Security group ID set.

Constraints: N/A

Table 13 ResourceExtendParams

Parameter

Mandatory

Type

Description

dockerbasesize

No

String

Size of the container image space on a node.

postinstall

No

String

Post-installation script. The entered value must be encoded using Base64.

Response Parameters

Status code: 200

Table 14 Response body parameters

Parameter

Type

Description

kind

String

Definition: Type of a training job.

Range

  • job: common job

  • federated_pool_job: resource pool federated job

  • edge_job: edge job

  • hetero_job: heterogeneous job

  • mrs_job: MRS job

  • autosearch_job: auto search job

  • diag_job: diagnosis job

  • visualization_job: visualization job

metadata

JobMetadataResponse object

Definition: Training job metadata.

status

Status object

Definition: Training job status information.

algorithm

JobAlgorithmResponse object

Definition: Training job algorithm.

tasks

Array of TaskResponse objects

Definition: Heterogeneous training tasks.

spec

SpecResponse object

Definition: Training job specifications.

endpoints

JobEndpointsResp object

Definition: Configurations required for remotely accessing a training job.

Table 15 JobMetadataResponse

Parameter

Type

Description

id

String

Definition: Training job ID, which is generated and returned by ModelArts after a training job is created.

Range: N/A

name

String

Definition: Name of a training job.

Range: The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-).

workspace_id

String

Definition: Workspace where a specified job is located.

Range: N/A

description

String

Definition: Definition of a training job.

Range: N/A

create_time

Long

Definition: Time when a training job was created, in milliseconds. The value is generated and returned by ModelArts after a training job is created.

Range: N/A

user_name

String

Definition: Username for creating a training job. The username is generated and returned by ModelArts after a training job is created.

Range: N/A

annotations

Map<String,String>

Definition: Advanced functions of a training job.

Table 16 Status

Parameter

Type

Description

phase

String

Definition: Level-1 status of a training job.

Range:

  • Creating: The job is being created.

  • Pending: The job is pending.

  • Running: The job is running.

  • Failed: The job failed to run.

  • Completed: The job is complete.

  • Terminating: The job is being stopped.

  • Terminated: The job has been stopped.

  • Abnormal: The job is abnormal.

secondary_phase

String

Definition: Level-2 status of a training job. The values are internal detailed statuses and may be added, changed, or deleted. Dependency on the status is not recommended.

Range:

  • Creating: The job is being created.

  • Queuing: The job is queuing.

  • Running: The job is running.

  • Failed: The job failed to run.

  • Completed: The job is complete.

  • Terminating: The job is being stopped.

  • Terminated: The job has been stopped.

  • CreateFailed: The job fails to be created.

  • TerminatedFailed: The job fails to be stopped.

  • Unknown: The job is in an unknown state.

  • Lost: The job is abnormal.

duration

Long

Definition: Running duration of a training job, in ms.

Range: N/A

node_count_metrics

Array<Array<Integer>>

Definition: Node quantity change metric during a training job runtime.

tasks

Array of strings

Definition: Training job subtask name.

start_time

Long

Definition: Timestamp when a training job is started.

Range: N/A

task_statuses

Array of TaskStatuses objects

Definition: Status of the first failed subtask of a training job.

running_records

Array of RunningRecord objects

Definition: Running and fault recovery records of a training job.

Table 17 TaskStatuses

Parameter

Type

Description

task

String

Definition: Training job subtask name.

Range: N/A

exit_code

Integer

Definition: Exit code of a training job subtask.

Range: N/A

message

String

Definition: Error message of a training job subtask.

Range: N/A

Table 18 RunningRecord

Parameter

Type

Description

start_at

Integer

Definition: Unix timestamp of the start time in the current running record, in seconds.

Range: N/A

end_at

Integer

Definition: Unix timestamp of the end time in the current running record, in seconds.

Range: N/A

xpu_start_at

Integer

Definition: Unix timestamp of the accelerator card startup time in the current running record, in seconds.

Range: N/A

start_type

String

Definition: Startup mode of the current execution.

Range

  • init_or_rescheduled: This startup is the first running after scheduling, including the first startup and the running after scheduling recovery.

  • restarted: This startup is not the first running after scheduling but the running after a process restart.

end_reason

String

Definition: Reason why the running ends.

Range: N/A

end_related_task

String

Definition: ID of the task worker (for example, worker-0) that ends the running.

Range: N/A

end_recover

String

Definition: Fault tolerance policy adopted when the execution ends abnormally.

Range

  • npu_proc_restart: NPU in-place hot recovery

  • proc_restart: in-place process recovery

  • npu_step_retry: step recomputation

  • pod_reschedule: pod-level rescheduling

  • job_reschedule: job-level rescheduling

  • job_reschedule_with_taint: isolated job-level rescheduling

end_recover_before_downgrade

String

Definition: There is a downgrade relationship between policies. If a policy fails to be executed, it will be downgraded to another specified policy. end_recover_before_downgrade indicates the tolerance policy used before end_recover is downgraded.

Range: same as that of end_recover.

recover_records

Array of RecoverRecord objects

Definition: details about all fault tolerance policies adopted when the execution ends abnormally.

Table 19 RecoverRecord

Parameter

Type

Description

recover_start_at

Integer

Unix timestamp of the start time of the fault tolerance policy, in seconds. The timestamp is also the fault occurrence time.

recover_end_at

Integer

Unix timestamp of the end time of the fault tolerance policy, in seconds.

recover

String

Fault tolerance policy. Options:

  • npu_step_retry: step recomputation

  • npu_proc_restart: NPU in-place hot recovery

  • proc_restart: in-place process recovery

  • pod_reschedule: pod-level rescheduling

  • job_reschedule: job-level rescheduling

  • job_reschedule_with_taint: isolated job-level rescheduling

fault_scenario

String

Fault scenario. Options:

  • chip_fault: chip fault

  • node_fault: node fault

  • job_failed: job exit upon a failure

  • job_hanged: job suspension

  • job_subhealth: job subhealth

  • error_in_log: log exception

reason

String

Cause of the fault.

related_task

String

ID of the task worker that causes the end of the current running record, for example, worker-0.

recover_result

String

Execution result of the fault. Options:

  • recovering: executing

  • success: successful

  • failed: failed

  • downgrade: policy downgrade

Table 20 JobAlgorithmResponse

Parameter

Type

Description

id

String

Definition: Training job algorithm.

Range:

  • id: Only the algorithm ID is used.

  • subscription_id+item_version_id: The subscription ID and version ID of the algorithm are used.

  • code_dir+boot_file: The code directory and boot file of the training job are used.

name

String

Definition: Algorithm name.

Range: N/A

subscription_id

String

Definition: Subscription ID of a subscription algorithm, which must be used with item_version_id.

Range: N/A

item_version_id

String

Definition: Version of a subscription algorithm, which must be used with subscription_id.

Range: N/A

code_dir

String

Definition: Code directory of a training job, for example, /usr/app/. This parameter must be used with boot_file. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

Range: N/A

boot_file

String

Definition: Boot file of a training job, which must be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used with code_dir. Leave this parameter blank if id, or subscription_id and item_version_id are specified.

Range: N/A

autosearch_config_path

String

Definition: YAML configuration path of an auto search job. An OBS URL is required. For example, obs://bucket/file.yaml.

Range: N/A

autosearch_framework_path

String

Definition: Framework code directory of an auto search job. An OBS URL is required. For example, obs://bucket/files/.

Range: N/A

command

String

Definition: Boot command for starting the container of a custom image for a training job. For example, python train.py.

Range: N/A

parameters

Array of ParameterResp objects

Definition: Running parameters of the training job.

policies

policies object

Definition: Policy supported by a job.

inputs

Array of InputResp objects

Definition: Data input of a training job.

outputs

Array of OutputResp objects

Definition: Output of the training job.

engine

JobEngineResp object

Definition: Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm in algorithm management, or subscription_id+item_version_id of the subscribed algorithm.

local_code_dir

String

Definition: Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows:

  • The directory must be under /home.

  • In v1 compatibility mode, the current field does not take effect.

  • When code_dir is prefixed with file://, the current field does not take effect.

Range: N/A

working_dir

String

Definition: Work directory where an algorithm is executed. Rules:

In v1 compatibility mode, this parameter does not take effect.

Range: N/A

environments

Array of Map<String,String> objects

Definition: Environment variables of a training job. The format is key:value. Leave this parameter blank.

summary

SummaryResp object

Definition: Visualization log summary.

Table 21 ParameterResp

Parameter

Type

Description

name

String

Definition: Parameter name.

Range: N/A

value

String

Definition: Parameter value.

Range: N/A

description

String

Definition: Parameter description.

Range: N/A

constraint

constraint object

Definition: Parameter attribute.

i18n_description

i18n_description object

Definition: Internationalization description.

Table 22 constraint

Parameter

Type

Description

type

String

Definition: Parameter type.

Range: N/A

editable

Boolean

Definition: Whether the parameter can be edited.

Range:

  • true: editable

  • false: Not uneditable

required

Boolean

Definition: Whether the parameter is mandatory.

Range:

  • true: mandatory

  • false: optional

sensitive

Boolean

Definition: Whether the parameter is sensitive. This function is unavailable currently.

Range:

  • true: sensitive

  • false: insensitive

valid_type

String

Definition: Valid type.

Range: N/A

valid_range

Array of strings

Definition: Valid range.

Table 23 i18n_description

Parameter

Type

Description

language

String

Definition: Internationalization language. The options are as follows:

  • zh-cn: Chinese

  • en-us: English](tag:hc,hk)

Range: N/A

description

String

Definition: Internationalization language description.

Range: N/A

Table 24 policies

Parameter

Type

Description

auto_search

auto_search object

Definition: Hyperparameter search configuration.

Table 26 reward_attrs

Parameter

Type

Description

name

String

Definition: Metric name.

Range: N/A

mode

String

Definition: Search mode.

Range:

  • max: A larger metric value is preferred.

  • min: A smaller metric value is preferred.

regex

String

Definition: Regular expression of a metric.

Range: N/A

Table 27 search_params

Parameter

Type

Description

name

String

Definition: Hyperparameter name.

Range: N/A

param_type

String

Definition: Parameter type.

Range:

  • continuous: The hyperparameter is of the continuous type. When an algorithm is used in a training job, continuous hyperparameters are displayed as text boxes on the console.

  • discrete: The hyperparameter is of the discrete type. When an algorithm is used in a training job, discrete hyperparameters are displayed as drop-down lists on the console.

lower_bound

String

Definition: Lower bound of the hyperparameter.

Range: N/A

upper_bound

String

Definition: Upper bound of the hyperparameter.

Range: N/A

discrete_points_num

String

Definition: Number of discrete points of a hyperparameter with continuous values.

Range: N/A

discrete_values

Array of strings

Definition: Discrete hyperparameter values.

Table 28 algo_configs

Parameter

Type

Description

name

String

Definition: Search algorithm name.

Range: N/A

params

Array of AutoSearchAlgoConfigParameterResp objects

Definition: Search algorithm parameters.

Table 29 AutoSearchAlgoConfigParameterResp

Parameter

Type

Description

key

String

Definition: Parameter key.

Range: N/A

value

String

Definition: Parameter value.

Range: N/A

type

String

Definition: Parameter type.

Range: N/A

Table 30 InputResp

Parameter

Type

Description

name

String

Definition: Name of the data input channel.

Range: N/A

description

String

Definition: Description of the data input channel.

Range: N/A

local_dir

String

Definition: Local path of the container to which the data input channels are mapped. Example: /home/ma-user/modelarts/inputs/data_url_0

Range: N/A

access_method

String

Definition: Access method of the input data channel path (local_dir).

Range:

  • parameter: hyperparameters

  • env: environment variables

remote

InputDataInfoResp object

Definition: Description of the actual data input.

remote_constraint

Array of remote_constraint objects

Definition: Data input constraint.

Table 31 InputDataInfoResp

Parameter

Type

Description

dataset

dataset object

Definition: The input is a dataset.

obs

obs object

Definition: OBS in which data input and output are stored.

Table 32 dataset

Parameter

Type

Description

id

String

Definition: Dataset ID of a training job.

Range: N/A

version_id

String

Definition: Dataset version ID of a training job.

Range: N/A

obs_url

String

Definition: OBS URL of the dataset for a training job. It is automatically parsed by ModelArts based on the dataset ID and dataset version ID. For example, /usr/data/.

Range: N/A

Table 33 obs

Parameter

Type

Description

obs_url

String

Definition: OBS URL of the dataset for a training job, For example, /usr/data/.

Range: N/A

Table 34 remote_constraint

Parameter

Type

Description

data_type

String

Definition: Data input type, including the data storage location and dataset.

Constraints: N/A

Range: N/A

Default Value: N/A

attributes

String

Definition: Related attributes.

Constraints: N/A

Range:

If the input is a dataset:

  • data_format: data format

  • data_segmentation: data segmentation method

  • dataset_type: data labeling type

Default Value: N/A

Table 35 OutputResp

Parameter

Type

Description

name

String

Definition: Name of the data output channel.

Range: N/A

description

String

Definition: Description of the data output channel.

Range: N/A

local_dir

String

Definition: Local path of the container to which the data output channels are mapped.

Range: N/A

access_method

String

Definition: Access method of the input data channel path (local_dir).

Range:

  • parameter: hyperparameters

  • env: environment variables

remote

RemoteResp object

Definition: Description of the actual data output.

Table 36 JobEngineResp

Parameter

Type

Description

engine_id

String

Definition: Engine ID selected for a training job.

Range: N/A

engine_name

String

Definition: Engine name selected for a training job.

Range: N/A

engine_version

String

Definition: Engine version selected for a training job.

Range: N/A

image_url

String

Definition: Custom image URL selected for a training job. The URL is obtained from SWR.

Range: N/A

install_sys_packages

Boolean

Definition: Specifies whether to install the MoXing version specified by the training platform.

Range:

  • true: yes

  • false: no

Table 37 SummaryResp

Parameter

Type

Description

log_type

String

Definition: Visualization log type of a training job. After this parameter is configured, the training job can be used as the data source of a visualization job.

Range:

  • tensorboard: TensorBoard

  • mindstudio-insight: MindStudio Insight

log_dir

LogDirResp object

Definition: Visualization log output of a training job.

data_sources

Array of DataSourceResp objects

Definition: Visualization log input of the visualization job or training job debugging mode.

Table 38 LogDirResp

Parameter

Type

Description

pfs

PFSSummaryResp object

Definition: Output of an OBS parallel file system.

Table 39 PFSSummaryResp

Parameter

Type

Description

pfs_path

String

Definition: URL of the OBS parallel file system.

Range: N/A

Table 40 DataSourceResp

Parameter

Type

Description

job

JobSummaryResp object

Definition: Job data source.

Table 41 JobSummaryResp

Parameter

Type

Description

job_id

String

Definition: ID of a training job.

Range: N/A

Table 42 TaskResponse

Parameter

Type

Description

role

String

Definition: Task role. This function is not supported currently.

Range: N/A

algorithm

TaskResponseAlgorithm object

Definition: Algorithm configurations for algorithm management.

task_resource

FlavorResponse object

Definition: Specifications of a training job or algorithm.

log_export_path

log_export_path object

Definition: Saved information about training job logs.

Table 43 TaskResponseAlgorithm

Parameter

Type

Description

code_dir

String

Definition: Absolute path of the directory where the algorithm boot file is stored.

Range: N/A

boot_file

String

Definition: Absolute path of an algorithm boot file.

Range: N/A

inputs

AlgorithmInput object

Definition: Information about the algorithm input channel.

outputs

AlgorithmOutput object

Definition: Information about the algorithm output channel.

engine

AlgorithmEngine object

Definition: Engine that a heterogeneous job depends on.

local_code_dir

String

Definition: Local directory of the training container to which the algorithm code directory is downloaded. The rules are as follows:

  • The directory must be under /home.

  • In v1 compatibility mode, the current field does not take effect.

  • When code_dir is prefixed with file://, the current field does not take effect.

Range: N/A

working_dir

String

Definition: Work directory where an algorithm is executed. Note that this parameter does not take effect in v1 compatibility mode.

Range: N/A

environments

Map<String,String>

Definition: Environment variables related to a training job.

Range: N/A

Table 44 AlgorithmInput

Parameter

Type

Description

name

String

Definition: Name of the data input channel.

Range: N/A

local_dir

String

Definition: Local path of the container to which the data input and output channels are mapped.

Range: N/A

remote

AlgorithmRemote object

Definition: Actual data input, which can only be OBS for heterogeneous jobs.

Table 45 AlgorithmRemote

Parameter

Type

Description

obs

RemoteObsResp object

Definition: OBS in which data input and output are stored.

Table 46 AlgorithmOutput

Parameter

Type

Description

name

String

Definition: Name of the data output channel.

Range: N/A

local_dir

String

Definition: Local path of the container to which the data output channels are mapped.

Range: N/A

remote

RemoteResp object

Definition: Description of the actual data output.

mode

String

Definition: Data transmission mode. The default value is upload_periodically.

Range: N/A

period

String

Definition: Data transmission period. The default value is 30s.

Range: N/A

Table 47 RemoteResp

Parameter

Type

Description

obs

RemoteObsResp object

Definition: Data actually output to OBS.

Table 48 RemoteObsResp

Parameter

Type

Description

obs_url

String

Definition: Path of the data output to OBS.

Range: N/A

Table 49 AlgorithmEngine

Parameter

Type

Description

engine_id

String

Definition: Engine flavor ID, for example, caffe-1.0.0-python2.7.

Range: N/A

engine_name

String

Definition: Engine flavor name, for example, Caffe.

Range: N/A

engine_version

String

Definition: Engine flavor version. Engines with the same name have multiple versions, for example, Caffe-1.0.0-python2.7 of Python 2.7.

Range: N/A

v1_compatible

Boolean

Definition: Specifies whether the v1 compatibility mode is used.

Range:

  • true: The v1 compatibility mode is used.

  • false: The v1 compatibility mode is not used.

run_user

String

Definition: Default UID for the engine startup.

Range: N/A

image_url

String

Definition: Custom image URL selected for an algorithm.

Range: N/A

Table 50 FlavorResponse

Parameter

Type

Description

pool_id

String

Definition: ID of the resource pool selected for a training job.

Range: N/A

flavor_id

String

Definition: Resource flavor ID.

Range: N/A

flavor_name

String

Definition: Resource flavor name.

Range: N/A

max_num

Integer

Definition: Maximum number of nodes supported by a flavor.

Range: N/A

flavor_type

String

Definition: Resource flavor type.

Range:

  • CPU

  • GPU

  • Ascend

billing

BillingInfo object

Definition: Billing information of a resource flavor.

flavor_info

FlavorInfoResponse object

Definition: Resource flavor details.

attributes

Map<String,String>

Definition: Other flavor attributes.

Range: N/A

Table 51 FlavorInfoResponse

Parameter

Type

Description

max_num

Integer

Definition: Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported.

Range: N/A

cpu

Cpu object

Definition: CPU specifications.

gpu

Gpu object

Definition: GPU specifications.

npu

Npu object

Definition: Ascend specifications.

memory

Memory object

Definition: Memory information.

disk

DiskResponse object

Definition: Disk information.

Table 52 DiskResponse

Parameter

Type

Description

size

Integer

Definition: Disk size.

Range: N/A

unit

String

Definition: Unit of the disk size.

Range: N/A

Table 53 log_export_path

Parameter

Type

Description

obs_url

String

Definition: OBS path for storing training job logs.

Table 54 SpecResponse

Parameter

Type

Description

resource

Resource object

Definition: Resource flavor of a training job. Select either flavor_id or pool_id and flavor_id.

volumes

Array of JobVolumeResp objects

Definition: Mounting volume information of a training job.

log_export_path

LogExportPathResp object

Definition: Log output of a training job.

schedule_policy

SchedulePolicyResp object

Definition: Scheduling policy of a training job.

custom_metrics

Array of CustomMetrics objects

Definition: Metric collection configuration.

Table 55 Resource

Parameter

Type

Description

policy

String

Definition: Resource flavor mode of a training job.

Range:

  • regular: standard mode

flavor_id

String

Definition: ID of the resource flavor of a training job.

Range: The flavor_id parameter cannot be specified for a dedicated resource pool of CPU specifications. The options for dedicated resource pools with GPU/Ascend specifications are as follows:

  • modelarts.pool.visual.xlarge (1 PU)

  • modelarts.pool.visual.2xlarge (2 PUs)

  • modelarts.pool.visual.4xlarge (4 PUs)

  • modelarts.pool.visual.8xlarge (8 PUs)

flavor_name

String

Definition: Read-only flavor name returned by ModelArts when flavor_id is used.

Range: N/A

node_count

Integer

Definition: Number of resource replicas selected for a training job.

Range: N/A

pool_id

String

Definition: ID of the resource pool selected for a training job.

Range: N/A

pool_group_id

String

Definition: ID of the resource pool federation selected for a training job.

Range: N/A

flavor_detail

FlavorDetail object

Definition: Flavor details of a training job or algorithm. This parameter is available only for public resource pools.

main_container_allocated_resources

MainContainerAllocatedResources object

Definition: Resource specifications actually obtained by the training container of a training job.

main_container_customized_flavor

MainContainerCustomizedFlavor object

Definition: Custom flavor of a training job.

Range: The number of CPU cores and memory size must be greater than 0, and the number of accelerator PUs must be greater than or equal to 0.

Table 56 FlavorDetail

Parameter

Type

Description

flavor_type

String

Definition: Resource flavor type.

Range:

  • CPU

  • GPU

  • Ascend

billing

BillingInfo object

Definition: Billing information of a resource flavor.

flavor_info

FlavorInfo object

Definition: Resource flavor details.

Table 57 BillingInfo

Parameter

Type

Description

code

String

Definition: Billing code.

Range: N/A

unit_num

Integer

Definition: Billing unit.

Range: N/A

Table 58 FlavorInfo

Parameter

Type

Description

max_num

Integer

Definition: Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported.

Range: N/A

cpu

Cpu object

Definition: CPU specifications.

gpu

Gpu object

Definition: GPU specifications.

npu

Npu object

Definition: Ascend specifications.

memory

Memory object

Definition: Memory information.

disk

Disk object

Definition: Disk information.

Table 59 Cpu

Parameter

Type

Description

arch

String

Definition: CPU architecture.

Range: N/A

core_num

Integer

Definition: Number of cores.

Range: N/A

Table 60 Gpu

Parameter

Type

Description

unit_num

Integer

Definition: Number of GPUs.

Range: N/A

product_name

String

Definition: Product name.

Range: N/A

memory

String

Definition: Memory.

Range: N/A

Table 61 Npu

Parameter

Type

Description

unit_num

String

Definition: Number of NPUs.

Range: N/A

product_name

String

Definition: Product name.

Range: N/A

memory

String

Definition: Memory.

Range: N/A

Table 62 Memory

Parameter

Type

Description

size

Integer

Definition: Memory size.

Range: N/A

unit

String

Definition: Number of memory units.

Range: N/A

Table 63 Disk

Parameter

Type

Description

size

String

Definition: Disk size.

Range: N/A

unit

String

Definition: Unit of the disk size. Generally, the unit is GB.

Range: N/A

Table 64 MainContainerAllocatedResources

Parameter

Type

Description

cpu_arch

String

Definition: CPU architecture.

Range: N/A

cpu_core_num

Float

Definition: Number of cores.

Range: N/A

mem_size

Float

Definition: Memory information.

Range: N/A

accelerator_num

Float

Definition: Number of accelerator cards.

Range: N/A

accelerator_type

String

Definition: Type of accelerator cards.

Range: N/A

Table 65 MainContainerCustomizedFlavor

Parameter

Type

Description

cpu_core_num

Float

Definition: Number of CPU cores.

Range: greater than 0

mem_size

Float

Definition: Memory size.

Range: greater than 0

accelerator_num

Float

Definition: Number of accelerator cards.

Range: greater than or equal to 0

Table 66 JobVolumeResp

Parameter

Type

Description

nfs

NfsResp object

Definition: Volumes attached in NFS mode.

Table 67 NfsResp

Parameter

Type

Description

nfs_server_path

String

Definition: NFS server path, for example, 10.10.10.10:/example/path.

Range: N/A

local_path

String

Definition: Path for attaching volumes to the training container, for example, /example/path.

Range: N/A

read_only

Boolean

Definition: Specifies whether the disks attached to the container in NFS mode are read-only.

Range:

  • true: read only

  • false: non-read-only

Table 68 LogExportPathResp

Parameter

Type

Description

obs_url

String

Definition: OBS path for storing training job logs, for example, obs://example/path.

Range: N/A

host_path

String

Definition: Path of the host where training job logs are stored, for example, /example/path.

Range: N/A

Table 69 SchedulePolicyResp

Parameter

Type

Description

required_affinity

RequiredAffinityResp object

Definition: Affinity requirements of a training job.

priority

Integer

Definition: Priority of a training job.

Range: 0 to 3

preemptible

Boolean

Definition: Whether the resource can be preempted.

Range:

  • true: The resource can be preempted.

  • false: The resource cannot be preempted.

Table 70 RequiredAffinityResp

Parameter

Type

Description

affinity_type

String

Definition: Affinity scheduling policy.

Range:

  • cabinet: strong cabinet scheduling

  • hyperinstance: supernode affinity scheduling

affinity_group_size

Integer

Definition: Size of an affinity group.

Range: N/A

Table 71 CustomMetrics

Parameter

Type

Description

exec

Exec object

Definition: Metrics are collected in CLI mode.

http_get

HttpGet object

Definition: Metrics are collected in HTTP mode.

Table 72 Exec

Parameter

Type

Description

command

Array of strings

Definition: Metrics are collected in CLI mode.

Table 73 HttpGet

Parameter

Type

Description

path

String

Definition: URL for obtaining metrics over HTTP. Both the URL and the port below must either be configured together or remain empty.

Range: N/A

port

Integer

Definition: Port for obtaining metrics over HTTP. This parameter and the URL above must be set or left blank at the same time.

Range: N/A

Table 74 JobEndpointsResp

Parameter

Type

Description

ssh

SSHResp object

Definition: SSH connection information.

jupyter_lab

JupyterLab object

Definition: JupyterLab connection information.

tensorboard

Tensorboard object

Definition: TensorBoard connection information.

mindstudio_insight

MindStudioInsight object

Definition: MindStudio Insight connection information.

Table 75 SSHResp

Parameter

Type

Description

key_pair_names

Array of strings

Definition: Name of the SSH key pair, which can be created and viewed on the Key Pair page of the Elastic Cloud Server (ECS) console.

Range: N/A

task_urls

Array of TaskUrls objects

Definition: SSH connection address.

Table 76 TaskUrls

Parameter

Type

Description

task

String

Definition: Task ID of a training job.

Range: N/A

url

String

Definition: SSH connection address of a training job.

Range: N/A

Table 77 JupyterLab

Parameter

Type

Description

url

String

Definition: JupyterLab address of a training job.

Range: N/A

token

String

Definition: JupyterLab token of a training job.

Range: N/A

Table 78 Tensorboard

Parameter

Type

Description

url

String

Definition: TensorBoard address of a training job.

Range: N/A

token

String

Definition: TensorBoard token of a training job.

Range: N/A

Table 79 MindStudioInsight

Parameter

Type

Description

url

String

Definition: MindStudio Insight address of a training job.

Range: N/A

token

String

Definition: MindStudio Insight token of a training job.

Range: N/A

Status code: 404

Table 80 Response body parameters

Parameter

Type

Description

error_code

String

  • Definition: ModelArts error code.

Range: N/A

error_msg

String

Definition: Error message.

Range: N/A

Example Requests

POST /v2/{project_id}/pools/{pool_name}/nodes/batch-migrate

{
  "migratenodenames" : [ "os-node-created-mnmcf" ]
}

Example Responses

Status code: 404

Not found.

{
  "error_code" : "ModelArts.50015001",
  "error_msg" : "pool not found"
}

Status Codes

Status Code

Description

200

Request succeeded.

404

Not found.

Error Codes

See Error Codes.