Deploying a Model Service in ModelArts Studio (MaaS)

In MaaS, you can deploy built-in models from Model Square as services so that they can be called in service environments.

Operation Scenarios

Choose a model from Model Square or My Models for deployment. Once deployed, the model appears in the My Services list.

Billing

Model inference in MaaS uses compute and storage resources, which are billed. Compute resources are billed for running model services. Storage resources are billed for storing data in OBS. You will also be billed for using SMN. For details, see Model Inference Billing Items.

Constraints

This function is only available in the CN-Hong Kong region.

MaaS has predefined the maximum input and output lengths for inference.

The supported models may vary depending on the region.

**Table 1** Default maximum input and output lengths
Model	Default Maximum Input and Output Lengths (Token)
DeepSeek-R1-8K DeepSeek-V3-8K DeepSeek-R1-Distill-Qwen-14B-8K DeepSeek-R1-Distill-Qwen-32B-8K	8,192
DeepSeek-R1-16K DeepSeek-V3-16K QwQ-32B-16K	16,384
DeepSeek-R1-32K DeepSeek-R1-Distill-Qwen-32B-32K DeepSeek-V3-32K Deepseek-Coder-33B QwQ-32B-32K Qwen2.5-VL-7B-32K Qwen3-8B-32K Qwen3-32B-32K	32,768
DeepSeek-V3-64K Deepseek-V3.1-64K Qwen2.5-32B-64K Qwen3-235B-A22B-64K Kimi-K2	65,536
Other models	4,096

Prerequisites

You have prepared a dedicated resource pool. For details, see Preparing ModelArts Studio (MaaS) Resources.
You have created a model on the My Models page or accessed a model in the Model Square.

Procedure

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference.

On the Real-Time Inference page, click the My Services tab. In the upper right corner, click Deploy Model.

**Table 2** Parameters for deploying a model service
Parameter		Description
Service Settings	Name	Enter a service name. The name can contain 1 to 64 characters, including only letters, digits, hyphens (-), and underscores (_). It must start with a letter.
Service Settings	Description	Enter a description of up to 256 characters.
Model Settings	Model	Click Select Model and select a model from Model Square or My Models.
Resource Settings	Resource Pool Type	Only dedicated resource pools are supported. Dedicated resource pools are created separately and used exclusively.
	Instance Specifications	Select required instance specifications, which include the server type and model. Only resource specifications supported by the model are displayed.
	Instances	Set the number of servers.
Resource Settings	Queries Per Second (QPS)	Set the Queries Per Second (QPS) of the model service. Unit: queries/s NOTE: If error code ModelArts.4206 is displayed during deployment, the allowed QPS has been exceeded. In this case, restart the service after the traffic limiting is finished.
More Settings	Event Notification	Choose whether to enable event notification. This function is disabled by default, which means SMN is disabled. After this function is enabled, you will be notified of specific events, such as task status changes or suspected suspensions. In this case, you must configure the topic name and events. Topic: Topic of event notifications. Click Create Topic to create a topic on the SMN console. Event: Events you want to subscribe to, for example, Running, Terminated, or Failed. NOTE: After you create a topic on the SMN console, add a subscription to the topic, and confirm the subscription. Then, you will be notified of events. For details about how to subscribe to a topic, see Adding a Subscription. SMN charges you for the number of notification messages. For details, see Billing.
More Settings	Auto Stop	When using paid resources, choose whether to enable auto stop. If this function is enabled, configure the auto stop time. The value can be 1 hour, 2 hours, 4 hours, 6 hours, or Customize. When you enable this function, the service stops automatically when the time limit is reached. The time limit does not count down when the service is paused. This function is disabled by default, the service keeps running.

Click Submit.
In the My Services list, when the service status changes to Running, the model is deployed.

Using a dedicated resource pool for service deployments incurs no additional costs since its fees were already covered during purchase.
After the model is deployed, call its API. For details, see Calling a Model Service in ModelArts Studio (MaaS).

Viewing Service Information

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference. On the displayed page, click the My Services tab.

Click the target service to access its details page.

Details: You can view the basic information about the service, including the service, model, and resource settings.

Resources: You can view information about service resource monitoring metrics.

**Table 3** Resource monitoring parameters
Parameter	Description
Time range	You can collect statistics on the service resource usage in the last 1 hour, last 3 hours, last 12 hours, last 24 hours, last 7 days, or a custom period. Custom ranges allow viewing up to 30 days of data.
CPU Usage (%)	The CPU usage of the service.
Memory Usage (%)	The memory usage of the service.
NPU Compute Usage (%)	The NPU compute usage of the service.
NPU Memory Usage (%)	The NPU memory usage of the service.
Disk Read Rate (Bits/Min)	The disk read rate of the service.
Disk Write Rate (Bits/Min)	The disk write rate of the service.
Uplink Rate (Bits/Min)	The outbound traffic rate of the service.
Downlink Rate (Bits/Min)	The inbound traffic rate of the service.

Events: You can view the event information of the service. Events are saved for one month and will be automatically cleared then.
Logs: You can search for and view service logs.

In the upper part of the service details page, perform the following operations as required:
- Viewing service call data: Click Call Statistics to go to the Service Call Details page and view the monitoring data and failed call details. For details, see Viewing the Call Data and Monitoring Metrics of Real-Time Inference on ModelArts Studio (MaaS).
- Stopping or starting the service: For details, see Stopping or Starting a Service.
- Deleting the service: For details, see Deleting a Service.
- Calling the service: Click View Call Description and call the service as prompted. For details, see Calling a Model Service in ModelArts Studio (MaaS).

Related Operations

During AI development, you need to manage the service lifecycle, optimize deployed model services, and upgrade model services. For details, see Managing My Services in ModelArts Studio (MaaS).
For details about how to call APIs, see Calling a Model Service in ModelArts Studio (MaaS).

Parent topic: ModelArts Studio (MaaS) Real-Time Inference Services

Previous topic: Creating an Endpoint on ModelArts Studio (MaaS)

Next topic: Managing My Services in ModelArts Studio (MaaS)