Updated on 2025-09-29 GMT+08:00

Scaling Model Service Instances in ModelArts Studio (MaaS)

When using inference with LLMs, service requirements can vary greatly, necessitating flexible scaling to handle load changes and ensure high availability and efficient resource use.

MaaS enables manual scaling of model service instances without disrupting service operation.

Prerequisites

A model has been deployed in MaaS.

Constraints

The number of instances can only be changed when the model service is in the Running or Alarm state.

Billing

When you add more model service instances, costs for compute, storage, and tokens accumulate as you use the built-in MaaS services. For details, see Model Inference Billing Items.

Scaling Instances

  1. Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
  2. In the navigation pane on the left, choose Real-Time Inference.
  3. On the Real-Time Inference page, click the My Services tab. Choose More > Scale in the Operation column of the target service.
  4. Perform the following operations as required.
    • Scale-out: Increase the number of instances as required and click OK. In the Scale Service dialog box, click OK.
    • Scale-in: Reduce the number of instances as required and click OK. In the Confirm Scaling dialog box, confirm the information, enter YES, and click OK.
      Figure 1 Confirm Scaling

    In the My Services tab, click the service name to access its details page and check whether the change takes effect.

Follow-Up Operations