Updated on 2025-09-29 GMT+08:00

Deploying a Model Service in ModelArts Studio (MaaS)

In MaaS, you can deploy built-in models from Model Square as services so that they can be called in service environments.

Operation Scenarios

Choose a model from Model Square or My Models for deployment. Once deployed, the model appears in the My Services list.

Billing

Model inference in MaaS uses compute and storage resources, which are billed. Compute resources are billed for running model services. Storage resources are billed for storing data in OBS. You will also be billed for using SMN. For details, see Model Inference Billing Items.

Constraints

MaaS has predefined the maximum input and output lengths for inference.

Table 1 Default maximum input and output lengths

Model

Default Maximum Input and Output Lengths (Token)

DeepSeek-R1-8K

DeepSeek-V3-8K

DeepSeek-R1-Distill-Qwen-14B-8K

DeepSeek-R1-Distill-Qwen-32B-8K

8,192

DeepSeek-R1-16K

DeepSeek-V3-16K

QwQ-32B-16K

16,384

DeepSeek-R1-32K

DeepSeek-R1-Distill-Qwen-32B-32K

DeepSeek-V3-32K

Deepseek-Coder-33B

QwQ-32B-32K

Qwen2.5-VL-7B-32K

Qwen3-8B-32K

Qwen3-32B-32K

32,768

DeepSeek-V3-64K

Qwen2.5-32B-64K

Qwen3-235B-A22B-64K

Kimi-K2

65,536

DeepSeek-V3.1

131,072

Other models

4,096

Prerequisites

Procedure

  1. Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
  2. In the navigation pane on the left, choose Real-Time Inference.
  3. On the Real-Time Inference page, click the My Services tab. In the upper right corner, click Deploy Model.
    Table 2 Parameters for deploying a model service

    Parameter

    Description

    Service Settings

    Name

    Enter a service name.

    The name can contain 1 to 64 characters, including only letters, digits, hyphens (-), and underscores (_). It must start with a letter.

    Description

    Enter a description of up to 256 characters.

    Model Settings

    Model

    Click Select Model and select a model from Model Square or My Models.

    Resource Settings

    Resource Pool Type

    Only dedicated resource pools are supported. Dedicated resource pools are created separately and used exclusively.

    Instance Specifications

    Select required instance specifications, which include the server type and model. Only resource specifications supported by the model are displayed.

    Instances

    Set the number of servers.

    Resource Settings

    Queries Per Second (QPS)

    Set the Queries Per Second (QPS) of the model service.

    Unit: queries/s

    NOTE:

    If error code ModelArts.4206 is displayed during deployment, the allowed QPS has been exceeded. In this case, restart the service after the traffic limiting is finished.

    More Settings

    Event Notification

    Choose whether to enable event notification.

    • This function is disabled by default, which means SMN is disabled.
    • After this function is enabled, you will be notified of specific events, such as task status changes or suspected suspensions. In this case, you must configure the topic name and events.
      • Topic: Topic of event notifications. Click Create Topic to create a topic on the SMN console.
      • Event: Events you want to subscribe to, for example, Running, Terminated, or Failed.
    NOTE:
    • After you create a topic on the SMN console, add a subscription to the topic, and confirm the subscription. Then, you will be notified of events. For details about how to subscribe to a topic, see Adding a Subscription.
    • SMN charges you for the number of notification messages. For details, see Billing.

    Auto Stop

    When using paid resources, choose whether to enable auto stop.

    • If this function is enabled, configure the auto stop time. The value can be 1 hour, 2 hours, 4 hours, 6 hours, or Customize. When you enable this function, the service stops automatically when the time limit is reached. The time limit does not count down when the service is paused.
    • This function is disabled by default, the service keeps running.
  4. Click Submit.

    In the My Services list, when the service status changes to Running, the model is deployed.

    Using a dedicated resource pool for service deployments incurs no additional costs since its fees were already covered during purchase.

  5. After the model is deployed, call its API. For details, see Calling a Model Service in ModelArts Studio (MaaS).

Viewing Service Information

  1. Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
  2. In the navigation pane on the left, choose Real-Time Inference. On the displayed page, click the My Services tab.
  3. Click the target service to access its details page.
    • Details: You can view the basic information about the service, including the service, model, and resource settings.
    • Resources: You can view information about service resource monitoring metrics.
      Table 3 Resource monitoring parameters

      Parameter

      Description

      Time range

      You can collect statistics on the service resource usage in the last 1 hour, last 3 hours, last 12 hours, last 24 hours, last 7 days, or a custom period.

      Custom ranges allow viewing up to 30 days of data.

      CPU Usage (%)

      The CPU usage of the service.

      Memory Usage (%)

      The memory usage of the service.

      NPU Compute Usage (%)

      The NPU compute usage of the service.

      NPU Memory Usage (%)

      The NPU memory usage of the service.

      Disk Read Rate (Bits/Min)

      The disk read rate of the service.

      Disk Write Rate (Bits/Min)

      The disk write rate of the service.

      Uplink Rate (Bits/Min)

      The outbound traffic rate of the service.

      Downlink Rate (Bits/Min)

      The inbound traffic rate of the service.

    • Events: You can view the event information of the service. Events are saved for one month and will be automatically cleared then.
    • Logs: You can search for and view service logs.
  4. In the upper part of the service details page, perform the following operations as required:

Related Operations