Deploying a Model Service in ModelArts Studio (MaaS)
In MaaS, you can deploy built-in models from Model Square as services so that they can be called in service environments.
Description
Choose a model from Model Square for deployment. Once deployed, the model appears in the My Services list.
Billing
Model inference in MaaS uses compute and storage resources, which are billed. Compute resources are billed for running the model service. Storage resources are billed for storing data in OBS. You will also be billed for using SMN. For details, see Model Inference Billing Items.
Constraints
ModelArts Studio has predefined the maximum input and output lengths for inference.
Model |
Default Maximum Input and Output Lengths |
---|---|
DeepSeek-R1-Distill-Llama-70B-8K DeepSeek-R1-Distill-Qwen-14B-8K DeepSeek-R1-Distill-Qwen-32B-8K |
8192 |
DeepSeek-R1-Distill-Qwen-32B-32K |
32768 |
Prerequisites
You have prepared a dedicated resource pool. For details, see Preparing ModelArts Studio (MaaS) Resources.
Procedure
- Log in to ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
- In the navigation pane on the left, choose Real-Time Inference.
- In the Real-Time Inference > My Services tab, click Deploy Model in the upper right corner.
Table 2 Parameters for deploying a model service Parameter
Description
Service Settings
Name
Enter a service name.
The name can contain 1 to 64 characters, including only letters, digits, hyphens (-), and underscores (_). It must start with a letter.
Description
Enter a description of up to 256 characters.
Model Settings
Model
Click Select Model and select a model from Model Square.
Resource Settings
Resource Pool Type
Only dedicated resource pools are supported. Dedicated resource pools are created separately and used exclusively.
Instance Specifications
Select required instance specifications, which include the server type and model. Only resource specifications supported by the model are displayed.
Instances
Set the number of servers.
Resource Settings
Queries Per Second (QPS)
Set the Queries Per Second (QPS) of the model service.
Unit: queries/s
NOTE:If error code ModelArts.4206 is displayed during deployment, the allowed QPS has been exceeded. In this case, restart the service after the traffic limiting is finished.
More Settings
Event Notification
Choose whether to enable event notification.
- This function is disabled by default, which means SMN is disabled.
- After this function is enabled, you will be notified of specific events, such as task status changes or suspected suspensions. In this case, you must configure the topic name and events.
- Topic: topic of event notifications. Click Create Topic to create a topic on the SMN console.
- Event: events you want to subscribe to, for example, Running, Terminated, or Failed.
NOTE:- After you create a topic on the SMN console, add a subscription to the topic, and confirm the subscription. Then, you will be notified of events. For details about how to subscribe to a topic, see Adding a Subscription.
- SMN charges you for the number of notification messages. For details, see Billing.
Auto Stop
When using paid resources, choose whether to enable auto stop.
- If this function is enabled, configure the auto stop time. The value can be 1 hour, 2 hours, 4 hours, 6 hours, or Customize. When you enable this function, the service stops automatically when the time limit is reached. The time limit does not count down when the service is paused.
- This function is disabled by default, the service keeps running.
- Click Submit.
In the My Services list, when the service status changes to Running, the model is deployed.
Using a dedicated resource pool for service deployments incurs no additional costs since its fees were already covered during purchase.
- After the model is deployed, call its API. For details, see Calling a Model Service in ModelArts Studio (MaaS).
Viewing Service Information
- Log in to ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
- In the navigation pane on the left, choose Real-Time Inference.
- Click the target service to access its details page.
- Details: You can view the basic information about the service, including the service, model, and resource settings.
- Resources: You can view the compute usage, video RAM usage, and resource monitoring information of the service.
Table 3 Resource monitoring parameters Parameter
Description
Compute Usage
Compute usage of the service. When the request rate is low, the usage is displayed as 0.
Video Memory Usage
Video RAM usage of the service.
- Events: You can view the event information of the service. Events are saved for one month and will be automatically cleared then.
- Logs: You can search for and view service logs.
Related Operations
- During AI development, you need to manage the service lifecycle, optimize deployed model services, and upgrade model services. For details, see Managing My Services in ModelArts Studio (MaaS).
- For details about how to call APIs, see Calling a Model Service in ModelArts Studio (MaaS).
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot