Deploying a Model as a Service
Function
This API is used to deploy a model as a service.
URI
POST /v1/{project_id}/services
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
project_id |
Yes |
String |
Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. |
Request Body
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
service_name |
Yes |
String |
Service name. The value can contain 1 to 64 visible characters, including Chinese characters. Only letters, Chinese characters, digits, hyphens (-), and underscores (_) are allowed. |
|
description |
No |
String |
Service description, which contains a maximum of 100 characters. By default, this parameter is left blank. |
|
infer_type |
Yes |
String |
Inference mode. The value can be real-time, batch, or edge.
|
|
workspace_id |
No |
String |
ID of the workspace to which a service belongs. The default value is 0, indicating the default workspace. |
|
vpc_id |
No |
String |
ID of the VPC to which a real-time service instance is deployed. By default, this parameter is left blank. In this case, ModelArts allocates a dedicated VPC to each user so that users are isolated from each other. If you need to access other service components in a VPC of a service instance, set this parameter to the ID of the corresponding VPC. Once a VPC is configured, it cannot be modified. When vpc_id and cluster_id are configured together, only the dedicated cluster parameter takes effect. |
|
subnet_network_id |
No |
String |
ID of a subnet. By default, this parameter is left blank. This parameter is mandatory when vpc_id is configured. Enter the network ID displayed in the subnet details on the VPC management console. A subnet provides dedicated network resources that are isolated from other networks. |
|
security_group_id |
No |
String |
Security group. By default, this parameter is left blank. This parameter is mandatory when vpc_id is configured. A security group is a virtual firewall that provides secure network access control policies for service instances. A security group must contain at least one inbound rule to permit the requests whose protocol is TCP, source address is 0.0.0.0/0, and port number is 8080. |
|
cluster_id |
No |
String |
ID of a dedicated cluster. This parameter is left blank by default, indicating that no dedicated cluster is used. When using the dedicated cluster to deploy services, ensure that the cluster status is normal. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect. If this parameter is configured together with cluster_id in real-time config, cluster_id in real-time config is used preferentially. |
|
config |
Yes |
config array corresponding to infer_type |
Model running configuration. If infer_type is batch or edge, you can configure only one model. If you upload multiple models, the first model is used for creating a service by default. If infer_type is real-time, you can configure multiple models and assign weights based on service requirements. However, the version numbers of multiple models cannot be the same. |
|
schedule |
No |
schedule array |
Service scheduling configuration, which can be configured only for real-time services. By default, this parameter is not used. Services run for a long time. For details, see Table 6. |
|
additional_properties |
No |
Map<String, Object> |
Additional service attribute, which facilitates service management. For details, see Table 7. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
model_id |
Yes |
String |
Model ID |
|
weight |
Yes |
Integer |
Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100. |
|
specification |
Yes |
String |
Resource specifications. Select specifications based on service requirements. For the current version, the following specifications are available:
|
|
instance_count |
Yes |
Integer |
Number of instances deployed in a model The value must be greater than 0. |
|
envs |
No |
Map<String, String> |
(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. |
|
cluster_id |
No |
string |
ID of the dedicated resource pool. By default, this parameter is left blank, indicating that no dedicated resource pool is used. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
model_id |
Yes |
String |
Model ID |
|
specification |
Yes |
String |
Resource flavor. Available flavors: modelarts.vm.cpu.2u and modelarts.vm.gpu.p4 |
|
instance_count |
Yes |
Integer |
Number of instances deployed in a model |
|
envs |
No |
Map<String, String> |
(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. |
|
src_type |
No |
String |
Data source type. This parameter can be set to ManifestFile. By default, this parameter is left blank, indicating that only files in the src_path directory are read. If this parameter is set to ManifestFile, src_path must be set to a specific manifest file path. Multiple data paths can be specified in the manifest file. For details, see Manifest File Specifications. |
|
src_path |
Yes |
String |
OBS path of the input data of a batch job |
|
dest_path |
Yes |
String |
OBS path of the output data of a batch job |
|
req_uri |
Yes |
String |
Inference path of a batch job. The input parameters and input data vary with the inference path. |
|
mapping_type |
Yes |
String |
Mapping type of the input data. The value can be file or csv.
|
|
mapping_rule |
No |
Map |
Mapping between input parameters and CSV data. This parameter is mandatory only when mapping_type is set to csv. Mapping rule: The mapping rule comes from the input parameter (input_params) in the model configuration file config.json. When type is set to string/number/integer/boolean, you need to configure this parameter, that is, the index parameter. For a specific example, refer to the mapping relationship example. The index must be a positive integer starting from 0. If the value of index does not comply with the rule, this parameter is ignored in the request. After the mapping rule is configured, the corresponding CSV data must be separated by commas (,). |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
model_id |
Yes |
String |
Model ID |
|
specification |
Yes |
String |
Resource flavor. Currently, modelarts.vm.cpu.2u and modelarts.vm.gpu.p4 are available. |
|
envs |
No |
Map<String, String> |
(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. |
|
nodes |
Yes |
String array |
Edge node ID array |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
type |
Yes |
String |
Scheduling type. Currently, only the value stop is supported. |
|
time_unit |
Yes |
String |
Scheduling time unit. Possible values are as follows:
|
|
duration |
Yes |
Integer |
Value that maps to the time unit. For example, if the task stops after two hours, set time_unit to HOURS and duration to 2. |
|
Parameter |
Type |
Description |
|---|---|---|
|
smn_notification |
smn_notification structure |
SMN message notification structure, which is used to notify the user of the service status change. For details, see Table 8. |
Response Body
Samples
The following shows how to deploy different types of services.
- Sample request: Creating a real-time service
POST https://endpoint/v1/{project_id}/services { "service_name": "mnist", "description": "mnist service", "infer_type": "real-time", "config": [ { "model_id": "xxxmodel-idxxx", "weight": "100", "specification": "modelarts.vm.cpu.2u", "instance_count": 1 } ] } - Sample request: Creating a real-time service and configuring multi-version traffic distribution
{ "service_name": "mnist", "description": "mnist service", "infer_type": "real-time", "config": [ { "model_id": "xxxmodel-idxxx", "weight": "70", "specification": "modelarts.vm.cpu.2u", "instance_count": 1, "envs": { "model_name": "mxnet-model-1", "load_epoch": "0" } }, { "model_id": "xxxxxx", "weight": "30", "specification": "modelarts.vm.cpu.2u", "instance_count": 1 } ] } - Sample request for creating a real-time service in a dedicated resource pool with custom specifications
{ "service_name": "realtime-demo", "description": "", "infer_type": "real-time", "cluster_id": "8abf68a969c3cb3a0169c4acb24b0000", "config": [{ "model_id": "eb6a4a8c-5713-4a27-b8ed-c7e694499af5", "weight": "100", "cluster_id": "8abf68a969c3cb3a0169c4acb24b0000", "specification": "custom", "custom_spec": { "cpu": 1.5, "memory": 7500, "gpu_p4": 0, "ascend_a310": 0 }, "instance_count": 1 }] } - Sample request for creating a real-time service and setting it to automatically stop
{ "service_name": "service-demo", "description": "demo", "infer_type": "real-time", "config": [{ "model_id": "xxxmodel-idxxx", "weight": "100", "specification": "modelarts.vm.cpu.2u", "instance_count": 1 }], "schedule": [{ "type": "stop", "time_unit": "HOURS", "duration": 1 }] } - Sample request: Creating a batch service and setting mapping_type to file
{ "service_name": "batchservicetest", "description": "", "infer_type": "batch", "config": [{ "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2", "specification": "modelarts.vm.cpu.2u", "instance_count": 1, "src_path": "https://infers-data.obs.cn-north-4.myhuaweicloud.com/xgboosterdata/", "dest_path": "https://infers-data.obs.cn-north-4d.com/output/", "req_uri": "/", "mapping_type": "file" }] } - Sample request: Creating a batch service and setting mapping_type to csv
{ "service_name": "batchservicetest", "description": "", "infer_type": "batch", "config": [{ "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2", "specification": "modelarts.vm.cpu.2u", "instance_count": 1, "src_path": "https://infers-data.obs.cn-north-4.myhuaweicloud.com/xgboosterdata/", "dest_path": "https://infers-data.obs.cn-north-4.myhuaweicloud.com.com/output/", "req_uri": "/", "mapping_type": "csv", "mapping_rule": { "type": "object", "properties": { "data": { "type": "object", "properties": { "req_data": { "type": "array", "items": [{ "type": "object", "properties": { "input5": { "type": "number", "index": 0 }, "input4": { "type": "number", "index": 1 }, "input3": { "type": "number", "index": 2 }, "input2": { "type": "number", "index": 3 }, "input1": { "type": "number", "index": 4 } } }] } } } } } }] }- The format of the inference request body described in mapping_rule is as follows:
{ "data": { "req_data": [{ "input1": 1, "input2": 2, "input3": 3, "input4": 4, "input5": 5 }] } }
- The format of the inference request body described in mapping_rule is as follows:
- Sample response
{ "service_id": "10eb0091-887f-4839-9929-cbc884f1e20e" }
Status Code
For details about the status code, see Table 1.
Last Article: Service Management
Next Article: Querying the List of Services
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.