Updated on 2025-08-04 GMT+08:00

Deploying a Model Service in ModelArts Studio (MaaS)

In MaaS, you can deploy built-in models from Model Square as services so that they can be called in service environments.

Description

Choose a model from Model Square for deployment. Once deployed, the model appears in the My Services list.

Billing

Model inference in MaaS uses compute and storage resources, which are billed. Compute resources are billed for running the model service. Storage resources are billed for storing data in OBS. You will also be billed for using SMN. For details, see Model Inference Billing Items.

Constraints

ModelArts Studio has predefined the maximum input and output lengths for inference.

Table 1 Default maximum input and output lengths

Model

Default Maximum Input and Output Lengths

DeepSeek-R1-Distill-Llama-70B-8K

DeepSeek-R1-Distill-Qwen-14B-8K

DeepSeek-R1-Distill-Qwen-32B-8K

8192

DeepSeek-R1-Distill-Qwen-32B-32K

32768

Prerequisites

You have prepared a dedicated resource pool. For details, see Preparing ModelArts Studio (MaaS) Resources.

Procedure

  1. Log in to ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
  2. In the navigation pane on the left, choose Real-Time Inference.
  3. In the Real-Time Inference > My Services tab, click Deploy Model in the upper right corner.
    Table 2 Parameters for deploying a model service

    Parameter

    Description

    Service Settings

    Name

    Enter a service name.

    The name can contain 1 to 64 characters, including only letters, digits, hyphens (-), and underscores (_). It must start with a letter.

    Description

    Enter a description of up to 256 characters.

    Model Settings

    Model

    Click Select Model and select a model from Model Square.

    Resource Settings

    Resource Pool Type

    Only dedicated resource pools are supported. Dedicated resource pools are created separately and used exclusively.

    Instance Specifications

    Select required instance specifications, which include the server type and model. Only resource specifications supported by the model are displayed.

    Instances

    Set the number of servers.

    Resource Settings

    Queries Per Second (QPS)

    Set the Queries Per Second (QPS) of the model service.

    Unit: queries/s

    NOTE:

    If error code ModelArts.4206 is displayed during deployment, the allowed QPS has been exceeded. In this case, restart the service after the traffic limiting is finished.

    More Settings

    Event Notification

    Choose whether to enable event notification.

    • This function is disabled by default, which means SMN is disabled.
    • After this function is enabled, you will be notified of specific events, such as task status changes or suspected suspensions. In this case, you must configure the topic name and events.
      • Topic: topic of event notifications. Click Create Topic to create a topic on the SMN console.
      • Event: events you want to subscribe to, for example, Running, Terminated, or Failed.
    NOTE:
    • After you create a topic on the SMN console, add a subscription to the topic, and confirm the subscription. Then, you will be notified of events. For details about how to subscribe to a topic, see Adding a Subscription.
    • SMN charges you for the number of notification messages. For details, see Billing.

    Auto Stop

    When using paid resources, choose whether to enable auto stop.

    • If this function is enabled, configure the auto stop time. The value can be 1 hour, 2 hours, 4 hours, 6 hours, or Customize. When you enable this function, the service stops automatically when the time limit is reached. The time limit does not count down when the service is paused.
    • This function is disabled by default, the service keeps running.
  4. Click Submit.

    In the My Services list, when the service status changes to Running, the model is deployed.

    Using a dedicated resource pool for service deployments incurs no additional costs since its fees were already covered during purchase.

  5. After the model is deployed, call its API. For details, see Calling a Model Service in ModelArts Studio (MaaS).

Viewing Service Information

  1. Log in to ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
  2. In the navigation pane on the left, choose Real-Time Inference.
  3. Click the target service to access its details page.
    • Details: You can view the basic information about the service, including the service, model, and resource settings.
    • Resources: You can view the compute usage, video RAM usage, and resource monitoring information of the service.
      Table 3 Resource monitoring parameters

      Parameter

      Description

      Compute Usage

      Compute usage of the service. When the request rate is low, the usage is displayed as 0.

      Video Memory Usage

      Video RAM usage of the service.

    • Events: You can view the event information of the service. Events are saved for one month and will be automatically cleared then.
    • Logs: You can search for and view service logs.

Related Operations