Updated on 2025-08-04 GMT+08:00

Scaling Model Service Instances in ModelArts Studio (MaaS)

When using inference with LLMs, service requirements can vary greatly, necessitating flexible scaling to handle load changes and ensure high availability and efficient resource use.

ModelArts Studio enables manual scaling of model service instances without disrupting service operation.

Prerequisites

A model has been deployed in ModelArts Studio (MaaS).

Notes and Constraints

The number of instances can only be changed when the model service is in the Running or Alarm state.

Billing

  • Once you add more model service instances, you will be billed for the tokens used with the built-in MaaS service. For details, see Billing.
  • Adding more model service instances increases resource usage for model inference in MaaS, leading to billing for compute and storage. Compute resources are billed for running the model service. Storage resources are billed for storing data in OBS. For details, see Model Inference Billing Items.

Scaling Instances

  1. Log in to the ModelArts Studio console and select the target region on the top navigation bar.
  2. In the navigation pane on the left, choose Real-Time Inference.
  3. On the Real-Time Inference page, click the My Services tab. Choose More > Scale in the Operation column of the target service.
  4. Add or delete model service instances based on service requirements. Then, click OK.
  5. In the displayed dialog box, click OK.

    In the My Services tab, click the service name to access its details page and check whether the change takes effect.