Deploying a Model Service in ModelArts Studio (MaaS)
In MaaS, you can deploy built-in models from Model Square as services so that they can be called in service environments.
Operation Scenarios
Choose a model from Model Square or My Models for deployment. Once deployed, the model appears in the My Services list.
Billing
Model inference in MaaS uses compute and storage resources, which are billed. Compute resources are billed for running model services. Storage resources are billed for storing data in OBS. You will also be billed for using SMN. For details, see Model Inference Billing Items.
Constraints
MaaS has predefined the maximum input and output lengths for inference.
Model |
Default Maximum Input and Output Lengths (Token) |
---|---|
DeepSeek-R1-8K DeepSeek-V3-8K DeepSeek-R1-Distill-Qwen-14B-8K DeepSeek-R1-Distill-Qwen-32B-8K |
8,192 |
DeepSeek-R1-16K DeepSeek-V3-16K QwQ-32B-16K |
16,384 |
DeepSeek-R1-32K DeepSeek-R1-Distill-Qwen-32B-32K DeepSeek-V3-32K Deepseek-Coder-33B QwQ-32B-32K Qwen2.5-VL-7B-32K Qwen3-8B-32K Qwen3-32B-32K |
32,768 |
DeepSeek-V3-64K Qwen2.5-32B-64K Qwen3-235B-A22B-64K Kimi-K2 |
65,536 |
DeepSeek-V3.1 |
131,072 |
Other models |
4,096 |
Prerequisites
- You have prepared a dedicated resource pool. For details, see Preparing ModelArts Studio (MaaS) Resources.
- You have created a model on the My Models page or accessed a model in the Model Square.
Procedure
- Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
- In the navigation pane on the left, choose Real-Time Inference.
- On the Real-Time Inference page, click the My Services tab. In the upper right corner, click Deploy Model.
Table 2 Parameters for deploying a model service Parameter
Description
Service Settings
Name
Enter a service name.
The name can contain 1 to 64 characters, including only letters, digits, hyphens (-), and underscores (_). It must start with a letter.
Description
Enter a description of up to 256 characters.
Model Settings
Model
Click Select Model and select a model from Model Square or My Models.
Resource Settings
Resource Pool Type
Only dedicated resource pools are supported. Dedicated resource pools are created separately and used exclusively.
Instance Specifications
Select required instance specifications, which include the server type and model. Only resource specifications supported by the model are displayed.
Instances
Set the number of servers.
Resource Settings
Queries Per Second (QPS)
Set the Queries Per Second (QPS) of the model service.
Unit: queries/s
NOTE:If error code ModelArts.4206 is displayed during deployment, the allowed QPS has been exceeded. In this case, restart the service after the traffic limiting is finished.
More Settings
Event Notification
Choose whether to enable event notification.
- This function is disabled by default, which means SMN is disabled.
- After this function is enabled, you will be notified of specific events, such as task status changes or suspected suspensions. In this case, you must configure the topic name and events.
- Topic: Topic of event notifications. Click Create Topic to create a topic on the SMN console.
- Event: Events you want to subscribe to, for example, Running, Terminated, or Failed.
NOTE:- After you create a topic on the SMN console, add a subscription to the topic, and confirm the subscription. Then, you will be notified of events. For details about how to subscribe to a topic, see Adding a Subscription.
- SMN charges you for the number of notification messages. For details, see Billing.
Auto Stop
When using paid resources, choose whether to enable auto stop.
- If this function is enabled, configure the auto stop time. The value can be 1 hour, 2 hours, 4 hours, 6 hours, or Customize. When you enable this function, the service stops automatically when the time limit is reached. The time limit does not count down when the service is paused.
- This function is disabled by default, the service keeps running.
- Click Submit.
In the My Services list, when the service status changes to Running, the model is deployed.
Using a dedicated resource pool for service deployments incurs no additional costs since its fees were already covered during purchase.
- After the model is deployed, call its API. For details, see Calling a Model Service in ModelArts Studio (MaaS).
Viewing Service Information
- Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
- In the navigation pane on the left, choose Real-Time Inference. On the displayed page, click the My Services tab.
- Click the target service to access its details page.
- Details: You can view the basic information about the service, including the service, model, and resource settings.
- Resources: You can view information about service resource monitoring metrics.
Table 3 Resource monitoring parameters Parameter
Description
Time range
You can collect statistics on the service resource usage in the last 1 hour, last 3 hours, last 12 hours, last 24 hours, last 7 days, or a custom period.
Custom ranges allow viewing up to 30 days of data.
CPU Usage (%)
The CPU usage of the service.
Memory Usage (%)
The memory usage of the service.
NPU Compute Usage (%)
The NPU compute usage of the service.
NPU Memory Usage (%)
The NPU memory usage of the service.
Disk Read Rate (Bits/Min)
The disk read rate of the service.
Disk Write Rate (Bits/Min)
The disk write rate of the service.
Uplink Rate (Bits/Min)
The outbound traffic rate of the service.
Downlink Rate (Bits/Min)
The inbound traffic rate of the service.
- Events: You can view the event information of the service. Events are saved for one month and will be automatically cleared then.
- Logs: You can search for and view service logs.
- In the upper part of the service details page, perform the following operations as required:
- Viewing service call data: Click Calls to go to the Service Call Details page and view the monitoring data and failed call details. For details, see Viewing the Call Data and Monitoring Metrics of Real-Time Inference on ModelArts Studio (MaaS).
- Stopping or starting the service: For details, see Stopping or Starting a Service.
- Deleting the service: For details, see Deleting a Service.
- Calling the service: Click View Call Description and call the service as prompted. For details, see Calling a Model Service in ModelArts Studio (MaaS).
Related Operations
- During AI development, you need to manage the service lifecycle, optimize deployed model services, and upgrade model services. For details, see Managing My Services in ModelArts Studio (MaaS).
- For details about how to call APIs, see Calling a Model Service in ModelArts Studio (MaaS).
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot