Help Center> ModelArts> API Reference> Service Management> Deploying a Model as a Service

Deploying a Model as a Service

Function

This API is used to deploy a model as a service.

URI

POST /v1/{project_id}/services

Table 1 describes the required parameters.
Table 1 Parameter description

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Request Body

Table 2 describes the request parameters.
Table 2 Parameter description

Parameter

Mandatory

Type

Description

service_name

Yes

String

Service name. The value can contain 1 to 64 visible characters, including Chinese characters. Only letters, Chinese characters, digits, hyphens (-), and underscores (_) are allowed.

description

No

String

Service description, which contains a maximum of 100 characters. By default, this parameter is left blank.

infer_type

Yes

String

Inference mode. The value can be real-time, batch, or edge.

  • real-time: real-time service. The service keeps running.
  • batch: batch service, which can be configured as tasks to run in batches. When the tasks are completed, the service stops automatically.
  • edge: inference service deployed on an edge node. You need to create a node on Intelligent EdgeFabric (IEF) in advance.

workspace_id

No

String

ID of the workspace to which a service belongs. The default value is 0, indicating the default workspace.

vpc_id

No

String

ID of the VPC to which a real-time service instance is deployed. By default, this parameter is left blank.

In this case, ModelArts allocates a dedicated VPC to each user so that users are isolated from each other. If you need to access other service components in a VPC of a service instance, set this parameter to the ID of the corresponding VPC.

Once a VPC is configured, it cannot be modified. When vpc_id and cluster_id are configured together, only the dedicated cluster parameter takes effect.

subnet_network_id

No

String

ID of a subnet. By default, this parameter is left blank.

This parameter is mandatory when vpc_id is configured. Enter the network ID displayed in the subnet details on the VPC management console. A subnet provides dedicated network resources that are isolated from other networks.

security_group_id

No

String

Security group. By default, this parameter is left blank. This parameter is mandatory when vpc_id is configured.

A security group is a virtual firewall that provides secure network access control policies for service instances. A security group must contain at least one inbound rule to permit the requests whose protocol is TCP, source address is 0.0.0.0/0, and port number is 8080.

cluster_id

No

String

ID of a dedicated cluster. This parameter is left blank by default, indicating that no dedicated cluster is used. When using the dedicated cluster to deploy services, ensure that the cluster status is normal. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect. If this parameter is configured together with cluster_id in real-time config, cluster_id in real-time config is used preferentially.

config

Yes

config array corresponding to infer_type

Model running configuration. If infer_type is batch or edge, you can configure only one model. If you upload multiple models, the first model is used for creating a service by default. If infer_type is real-time, you can configure multiple models and assign weights based on service requirements. However, the version numbers of multiple models cannot be the same.

schedule

No

schedule array

Service scheduling configuration, which can be configured only for real-time services. By default, this parameter is not used. Services run for a long time. For details, see Table 6.

additional_properties

No

Map<String, Object>

Additional service attribute, which facilitates service management. For details, see Table 7.

Table 3 config parameters of real-time

Parameter

Mandatory

Type

Description

model_id

Yes

String

Model ID

weight

Yes

Integer

Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100.

specification

Yes

String

Resource specifications. Select specifications based on service requirements. For the current version, the following specifications are available:

  • modelarts.vm.cpu.2u
  • modelarts.vm.gpu.0.25p4
  • modelarts.vm.gpu.0.5p4
  • modelarts.vm.gpu.p4
  • modelarts.vm.gpu.0.25t4
  • modelarts.vm.gpu.0.5t4
  • modelarts.vm.gpu.t4
  • modelarts.vm.arm.d310.3u6g
  • modelarts.vm.ai1.a310
  • modelarts.vm.cpu.free
  • modelarts.vm.gpu.free

instance_count

Yes

Integer

Number of instances deployed in a model The value must be greater than 0.

envs

No

Map<String, String>

(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank.

cluster_id

No

string

ID of the dedicated resource pool. By default, this parameter is left blank, indicating that no dedicated resource pool is used. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect.

Table 4 config parameters of batch

Parameter

Mandatory

Type

Description

model_id

Yes

String

Model ID

specification

Yes

String

Resource flavor. Available flavors: modelarts.vm.cpu.2u and modelarts.vm.gpu.p4

instance_count

Yes

Integer

Number of instances deployed in a model

envs

No

Map<String, String>

(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank.

src_type

No

String

Data source type. This parameter can be set to ManifestFile. By default, this parameter is left blank, indicating that only files in the src_path directory are read. If this parameter is set to ManifestFile, src_path must be set to a specific manifest file path. Multiple data paths can be specified in the manifest file. For details, see Manifest File Specifications.

src_path

Yes

String

OBS path of the input data of a batch job

dest_path

Yes

String

OBS path of the output data of a batch job

req_uri

Yes

String

Inference path of a batch job. The input parameters and input data vary with the inference path.

mapping_type

Yes

String

Mapping type of the input data. The value can be file or csv.

  • If you select file, each inference request corresponds to a file in the input data path. When this mode is used, req_uri of this model can have only one input parameter and the type of this parameter is file.
  • If you select csv, each inference request corresponds to a row of data in the CSV file. When this mode is used, the files in the input data path can only be in CSV format and mapping_rule needs to be configured to map the index of each parameter in the inference request body to the CSV file.

mapping_rule

No

Map

Mapping between input parameters and CSV data. This parameter is mandatory only when mapping_type is set to csv.

Mapping rule: The mapping rule comes from the input parameter (input_params) in the model configuration file config.json. When type is set to string/number/integer/boolean, you need to configure this parameter, that is, the index parameter. For a specific example, refer to the mapping relationship example.

The index must be a positive integer starting from 0. If the value of index does not comply with the rule, this parameter is ignored in the request. After the mapping rule is configured, the corresponding CSV data must be separated by commas (,).

Table 5 config parameters of edge

Parameter

Mandatory

Type

Description

model_id

Yes

String

Model ID

specification

Yes

String

Resource flavor. Currently, modelarts.vm.cpu.2u and modelarts.vm.gpu.p4 are available.

envs

No

Map<String, String>

(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank.

nodes

Yes

String array

Edge node ID array

Table 6 schedule parameters

Parameter

Mandatory

Type

Description

type

Yes

String

Scheduling type. Currently, only the value stop is supported.

time_unit

Yes

String

Scheduling time unit. Possible values are as follows:

  • DAYS
  • HOURS
  • MINUTES

duration

Yes

Integer

Value that maps to the time unit. For example, if the task stops after two hours, set time_unit to HOURS and duration to 2.

Table 7 Existing service attributes in additional_properties

Parameter

Type

Description

smn_notification

smn_notification structure

SMN message notification structure, which is used to notify the user of the service status change. For details, see Table 8.

Table 8 smn_notification structure

Parameter

Mandatory

Type

Description

topic_urn

Yes

String

URN of an SMN topic

events

Yes

List<Integer>

Event ID. Currently, the following event IDs are available:

  • 1: failed
  • 3: running
  • 7: concerning
  • 11: pending

Response Body

Table 9 describes the response parameters.
Table 9 Parameter description

Parameter

Type

Description

service_id

String

Service ID

Samples

The following shows how to deploy different types of services.

  • Sample request: Creating a real-time service
    POST    https://endpoint/v1/{project_id}/services
    {
      "service_name": "mnist",
      "description": "mnist service",
      "infer_type": "real-time",
      "config": [
        {
          "model_id": "xxxmodel-idxxx",
          "weight": "100",
          "specification": "modelarts.vm.cpu.2u",
          "instance_count": 1
        }
      ]
    }
  • Sample request: Creating a real-time service and configuring multi-version traffic distribution
    {
      "service_name": "mnist",
      "description": "mnist service",
      "infer_type": "real-time",
      "config": [
        {
          "model_id": "xxxmodel-idxxx",
          "weight": "70",
          "specification": "modelarts.vm.cpu.2u",
          "instance_count": 1,
          "envs":
          {
              "model_name": "mxnet-model-1",
              "load_epoch": "0"
          }
        },
        {
          "model_id": "xxxxxx",
          "weight": "30",
          "specification": "modelarts.vm.cpu.2u",
          "instance_count": 1
        }
      ]
    }
  • Sample request for creating a real-time service in a dedicated resource pool with custom specifications
    {
    	"service_name": "realtime-demo",
    	"description": "",
    	"infer_type": "real-time",
    	"cluster_id": "8abf68a969c3cb3a0169c4acb24b0000",
    	"config": [{
    		"model_id": "eb6a4a8c-5713-4a27-b8ed-c7e694499af5",
    		"weight": "100",
    		"cluster_id": "8abf68a969c3cb3a0169c4acb24b0000",
    		"specification": "custom",
    		"custom_spec": {
    			"cpu": 1.5,
    			"memory": 7500,
    			"gpu_p4": 0,
    			"ascend_a310": 0
    		},
    		"instance_count": 1
    	}]
    }
  • Sample request for creating a real-time service and setting it to automatically stop
    {
    	"service_name": "service-demo",
    	"description": "demo",
    	"infer_type": "real-time",
    	"config": [{
    		"model_id": "xxxmodel-idxxx",
    		"weight": "100",
    		"specification": "modelarts.vm.cpu.2u",
    		"instance_count": 1
    	}],
    	"schedule": [{
    		"type": "stop",
    		"time_unit": "HOURS",
    		"duration": 1
    	}]
    }
  • Sample request: Creating a batch service and setting mapping_type to file
    {
    "service_name": "batchservicetest",
    "description": "",
    "infer_type": "batch",
    "config": [{
        "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2",
        "specification": "modelarts.vm.cpu.2u",
        "instance_count": 1,
        "src_path": "https://infers-data.obs.cn-north-4.myhuaweicloud.com/xgboosterdata/",
        "dest_path": "https://infers-data.obs.cn-north-4d.com/output/",
        "req_uri": "/",
        "mapping_type": "file"
    }]
    }
  • Sample request: Creating a batch service and setting mapping_type to csv
    {
    "service_name": "batchservicetest",
    "description": "",
    "infer_type": "batch",
    "config": [{
        "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2",
        "specification": "modelarts.vm.cpu.2u",
        "instance_count": 1,
        "src_path": "https://infers-data.obs.cn-north-4.myhuaweicloud.com/xgboosterdata/",
        "dest_path": "https://infers-data.obs.cn-north-4.myhuaweicloud.com.com/output/",
        "req_uri": "/",
        "mapping_type": "csv",
        "mapping_rule": {
            "type": "object",
            "properties": {
                "data": {
                    "type": "object",
                    "properties": {
                        "req_data": {
                            "type": "array",
                            "items": [{
                                "type": "object",
                                "properties": {
                                    "input5": {
                                        "type": "number",
                                        "index": 0
                                    },
                                    "input4": {
                                        "type": "number",
                                        "index": 1
                                    },
                                    "input3": {
                                        "type": "number",
                                        "index": 2
                                    },
                                    "input2": {
                                        "type": "number",
                                        "index": 3
                                    },
                                    "input1": {
                                        "type": "number",
                                        "index": 4
                                    }
                                }
                            }]
                        }
                    }
                }
            }
        }
    }]
    }
    • The format of the inference request body described in mapping_rule is as follows:
      {
      "data": {
          "req_data": [{
              "input1": 1,
              "input2": 2,
              "input3": 3,
              "input4": 4,
              "input5": 5
          }]
      }
      }
  • Sample response
    {
      "service_id": "10eb0091-887f-4839-9929-cbc884f1e20e"
    }

Status Code

For details about the status code, see Table 1.