Scaling Model Service Instances in ModelArts Studio (MaaS)
When using inference with LLMs, service requirements can vary greatly, necessitating flexible scaling to handle load changes and ensure high availability and efficient resource use.
ModelArts Studio enables manual scaling of model service instances without disrupting service operation.
Prerequisites
A model has been deployed in ModelArts Studio (MaaS).
Notes and Constraints
The number of instances can only be changed when the model service is in the Running or Alarm state.
Billing
- Adding more model service instances increases resource usage for model inference in MaaS, leading to billing for compute and storage.
- Compute resources are billed for running the model service. Storage resources are billed for storing data in OBS. For details, see Table 1.
Table 1 Billing items Billing Item
Description
Billing Mode
Billing Formula
Compute resources
Public resource pools
Usage of compute resources.
For details, see ModelArts Pricing Details.
Pay-per-use
Flavor unit price x Number of instances x Usage duration
Package duration is preferentially deducted.
Dedicated resource pools
Fees for dedicated resource pools are paid upfront upon purchase. There are no additional charges for service deployment.
For details, see Billing Item.
N/A
N/A
Event notification (billed only when enabled)
This function uses Simple Message Notification (SMN) to send a message to you when the event you selected occurs.
To use this function, enable event notification when creating a training job.
For pricing details, see SMN Pricing Details.
Pay by actual usage
- SMS: SMS notifications
- Email: Email notifications + Downstream Internet traffic
- HTTP or HTTPS: HTTP or HTTPS notifications + Downstream Internet traffic
Scaling Instances
- Log in to ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
- In the navigation pane on the left, choose Real-Time Inference.
- On the Real-Time Inference page, click the My Services tab. Choose More > Scale in the Operation column of the target service.
- Perform the following operations as required.
- Scale-out: Increase the number of instances as required and click OK. In the Scale Service dialog box, click OK.
- Scale-in: Reduce the number of instances as required and click OK. In the Confirm Scaling dialog box, confirm the information, enter YES, and click OK.
Figure 1 Confirm Scaling
In the My Services tab, click the service name to access its details page and check whether the change takes effect.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot