Help Center> ModelArts> User Guide (Senior AI Engineers)> Model Deployment> Real-Time Services> Deploying a Model as a Real-Time Service

Deploying a Model as a Real-Time Service

After a model is prepared, you can deploy the model as a real-time service and predict and call the service.

A maximum of 20 real-time services can be deployed by a user.

Prerequisites

Data has been prepared. Specifically, you have created a model in the Normal state in ModelArts.
Ensure that the account is not in arrears. Resources are consumed when services are running.

Procedure

Log in to the ModelArts management console. In the left navigation pane, choose Service Deployment > Real-Time Services. By default, the system switches to the Real-Time Services page.
In the real-time service list, click Deploy in the upper left corner. The Deploy page is displayed.

Set parameters for a real-time service.

Enter basic information about model deployment. For details about the parameters, see Table 1.

**Table 1** Basic parameters of model deployment
Parameter	Description
Billing Mode	Currently, only pay-per-use billing is supported.
Name	Name of the real-time service. Set this parameter as prompted.
Auto Stop	After this parameter is enabled and the auto stop time is set, a service automatically stops at the specified time. If this parameter is disabled, a real-time service keeps running and billing. The function can help you avoid unnecessary billing. The auto stop function is enabled by default, and the default value is 1 hour later. Currently, the options are 1 hour later, 2 hours later, 4 hours later, 6 hours later, and Custom. If you select Custom, you can enter any integer from 1 to 24 hours in the textbox on the right.
Description	Brief description of the real-time service.

Figure 1 Basic information about deploying a model as a real-time service
Click to enlarge

Enter key information including the resource pool and model configurations. For details, see Table 2.

**Table 2** Parameter description
Parameter	Sub-Parameter	Description
Resource Pool	Public resource pools	Instances in the public resource pool can be of the CPU or GPU type. Pricing standards for resource pools with different instance flavors are different. For details, see Product Pricing Details. Currently, the public resource pool only supports the pay-per-use billing mode.
Resource Pool	Dedicated resource pools	For details about how to create a dedicated resource pool, see Creating a Dedicated Resource Pool. You can select a specification from the resource pool specifications.
Model and Configuration	Model Source	You can select My Models or My Subscriptions based on site requirements. The models that match the model sources are displayed.
	Model	The system automatically associates with the list of available models. Select a model in the Normal status and its version.
	Traffic Ratio (%)	Set the traffic proportion of the current instance node. Service calling requests are allocated to the current version based on this proportion. If you deploy only one version of a model, set this parameter to 100%. If you select multiple versions for gated launch, ensure that the sum of the traffic ratios of multiple versions is 100%.
	Specifications	If you select Public resource pools, you can select the CPU or GPU resources based on site requirements. For details, see Table 3.
	Compute Nodes	Set the number of instances for the current model version. If you set Instances to 1, the standalone computing mode is used. If you set Instances to a value greater than 1, the distributed computing mode is used. Select a computing mode based on the actual requirements.
	Environment Variable	Set environment variables and inject them to the container instance.
	Add Model and Configuration	ModelArts supports multiple model versions and flexible traffic policies. You can use gated launch to smoothly upgrade the model version. NOTE: If the selected model has only one version, the system does not display Add Model Version and Configuration.
Data Collection	-	Disabled by default. To enable this function, see Collecting Data for details and set the parameters as required.
Hard Example Filtering	-	Disabled by default. To enable this function, see Collecting Data for details and set the parameters as required.
Application Authentication	Application	Disabled by default. To enable this function, see Accessing a Real-Time Service (Application Authentication) for details and set the parameters as required.

**Table 3** Supported specifications
Specifications	Description
[Free] CPU: 1 vCPU \| 4 GiB	Free specifications, suitable for beginners to experience, limited by quantity and running time. The specifications can be used by models imported from ExeML or other modes.
ExeML (CPU) ExeML (GPU)	Only be used by models trained in ExeML projects.
CPU: 2 vCPUs \| 8 GiB	Suitable for models with only CPU loads.
CPU: 2 vCPUs \| 8 GiB GPU: 1 x P4 CPU: 8 vCPUs \| 32 GiB GPU: 1 x P4	Suitable for models requiring CPU and GPU (NVIDIA P4) resources.
ARM: 3 vCPUs \| 6 GiB Ascend: 1 x Ascend 310	Carrying one Ascend 310 chip, suitable for models requiring Ascend 310 chip resources. The specifications can be used only in the CN North-Beijing4 region.

Figure 2 Setting model information
Click to enlarge

After confirming the entered information, complete service deployment as prompted. Generally, service deployment jobs run for a period of time, which may be several minutes or tens of minutes depending on the amount of your selected data and resources.

After a real-time service is deployed, it is started immediately. During the running, you will be charged based on your selected resources.

You can go to the real-time service list to view the basic information about the real-time service. In the real-time service list, after the status of the newly deployed service changes from Deploying to Running, the service is deployed successfully.

Parent topic: Real-Time Services

Last Article: Real-Time Services

Next Article: Viewing Service Details

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English

Help Center

Deploying a Model as a Real-Time Service

Prerequisites

Procedure