Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Deploying as a Real-Time Service

Updated on 2025-01-06 GMT+08:00

After an AI application is prepared, you can deploy it as a real-time service and call the service for prediction.

Constraints

A maximum of 20 real-time services can be deployed by a user.

Prerequisites

  • Data has been prepared. Specifically, you have created an AI application in the Normal state in ModelArts.
  • The account is not in arrears to ensure available resources for service running.

Procedure

  1. Log in to the ModelArts management console. In the left navigation pane, choose Service Deployment > Real-Time Services. The real-time service list is displayed by default.
  2. In the real-time service list, click Deploy in the upper left corner. The Deploy page is displayed.
  3. Set parameters for a real-time service.
    1. Set basic information about model deployment. For details about the parameters, see Table 1.
      Table 1 Basic parameters

      Parameter

      Description

      Name

      Name of the real-time service. Set this parameter as prompted.

      Auto Stop

      After this parameter is enabled and the auto stop time is set, a service automatically stops at the specified time. If this parameter is disabled, a real-time service keeps running and billing. The function can help you avoid unnecessary billing. The auto stop function is enabled by default, and the default value is 1 hour later.

      The options are 1 hour later, 2 hours later, 4 hours later, 6 hours later, and Custom. If you select Custom, you can enter any integer from 1 to 24 hours in the text box on the right.

      Description

      Brief description of the real-time service.

    2. Enter key information including the resource pool and AI application configurations. For details, see Table 2.
      Table 2 Parameters

      Parameter

      Sub-Parameter

      Description

      Resource Pool

      Public Resource Pool

      CPU/GPU computing resources are available for you to select. Pricing standards for resource pools with different flavors are different. For details, see Product Pricing Details. The public resource pool only supports the pay-per-use billing mode.

      Dedicated Resource Pool

      Select a specification from the dedicated resource pool specifications. The physical pools with logical subpools created are not supported temporarily.

      NOTE:
      • The data of old-version dedicated resource pools will be gradually migrated to the new-version dedicated resource pools.
      • For new users and the existing users who have migrated data from old-version dedicated resource pools to new ones, there is only one entry to new-version dedicated resource pools on the ModelArts management console.
      • For the existing users who have not migrated data from old-version dedicated resource pools to new ones, there are two entries to dedicated resource pools on the ModelArts management console, where the entry marked with New is to the new version.

      For details about the new version of dedicated resource pools, see Comprehensive Upgrades to ModelArts Resource Pool Management Functions.

      AI Application and Configuration

      AI Application Source

      Select My AI Applications or My Subscriptions based on your requirements.

      AI Application and Version

      Select the AI application and version that are in the Normal state.

      Traffic Ratio (%)

      Set the traffic proportion of the current instance node. Service calling requests are allocated to the current version based on this proportion.

      If you deploy only one version of an AI application, set this parameter to 100%. If you select multiple versions for gray release, ensure that the sum of the traffic ratios of these versions is 100%.

      Specifications

      Select available specifications based on the list displayed on the console. The specifications in gray cannot be used in the current environment.

      If specifications in the public resource pools are unavailable, no public resource pool is available in the current environment. In this case, use a dedicated resource pool or contact the administrator to create a public resource pool.

      NOTE:

      When the selected flavor is used to deploy the service, necessary system consumption is generated. Therefore, the resources actually occupied by the service are slightly greater than the selected flavor.

      Compute Nodes

      Set the number of instances for the current AI application version. If you set the number of nodes to 1, the standalone computing mode is used. If you set the number of nodes to a value greater than 1, the distributed computing mode is used. Select a computing mode based on the actual requirements.

      Environment Variable

      Set environment variables and inject them to the pod. To ensure data security, do not enter sensitive information such as plaintext passwords in environment variables.

      Timeout

      Timeout of a single model, including both the deployment and startup time. The default value is 20 minutes. The value must range from 3 to 120.

      Add AI Application Version and Configuration

      If the selected AI application has multiple versions, you can add multiple versions and configure a traffic ratio. You can use gray launch to smoothly upgrade the AI application version.

      NOTE:

      Free compute specifications do not support the gray launch of multiple versions.

      Mount Storage

      This parameter is displayed when the resource pool is a dedicated resource pool. This function will mount a storage volume to compute nodes (compute instances) as a local directory when the service is running. It is recommended when the model or input data is large. Only OBS parallel file systems are supported.
      • Source Path: Select the storage path of the parallel file. A cross-region OBS parallel file system cannot be selected.
      • Mount Path: Enter the mount path of the container, for example, /obs-mount/.
        • Select a new directory. If you select an existing directory, existing files will be overwritten. OBS mounting allows you to add, view, and modify files in the mount directory but does not allow you to delete files in the mount directory. To delete files, manually delete them in the OBS parallel file system.
        • It is a good practice to mount the container to an empty directory. If the directory is not empty, ensure that there are no files affecting container startup in the directory. Otherwise, such files will be replaced, resulting in failures to start the container and create the workload.
        • The mount path must start with a slash (/) and can contain a maximum of 1,024 characters, including letters, digits, and the following special characters: \ _ -.
      NOTE:

      Storage mounting can be used only by services deployed in a dedicated resource pool.

      Traffic Limit

      N/A

      Maximum number of times a service can be accessed within a second. You can set this parameter as needed.

      WebSocket

      N/A

      Whether to deploy a real-time service as a WebSocket service. For details about WebSocket real-time services, see Full-Process Development of WebSocket Real-Time Services.

      NOTE:
      • This function is supported only if the AI application is WebSocket-compliant and comes from a container image.
      • After this function is enabled, Traffic Limit and Data Collection cannot be set.
      • This parameter cannot be changed after the service is deployed.

      Runtime Log Output

      N/A

      This function is disabled by default. The runtime logs of real-time services are stored only in the ModelArts log system. You can query the runtime logs on the Logs tab of the service details page.

      If this function is enabled, the runtime logs of real-time services will be exported and stored in Log Tank Service (LTS). LTS automatically creates log groups and log streams and caches run logs generated within seven days by default. For details about the LTS log management function, see Log Tank Service.

      NOTE:
      • This cannot be disabled once it is enabled.
      • You will be billed for the log query and log storage functions provided by LTS. For details, see section LTS Pricing Details.
      • Do not print unnecessary audio log files. Otherwise, system logs may fail to be displayed, and the error message "Failed to load audio" may be displayed.

      Application Authentication

      Application

      Disabled by default. To enable this function, see Access Authenticated Using an Application for details and set the parameters as required.

      Figure 1 Setting AI application information
    3. (Optional) Configure advanced settings.
      Table 3 Advanced settings

      Parameter

      Description

      Tags

      ModelArts can work with Tag Management Service (TMS). When creating resource-consuming tasks in ModelArts, for example, training jobs, configure tags for these tasks so that ModelArts can use tags to manage resources by group.

      For details about how to use tags, see How Does ModelArts Use Tags to Manage Resources by Group?

      NOTE:

      You can select a predefined TMS tag from the tag drop-down list or customize a tag. Predefined tags are available to all service resources that support tags. Customized tags are available only to the service resources of the user who has created the tags.

  4. After confirming the entered information, complete service deployment as prompted. Generally, service deployment jobs run for a period of time, which may be several minutes or tens of minutes depending on the amount of your selected data and resources.
    NOTE:

    After a real-time service is deployed, it is started immediately.

    You can go to the real-time service list to check whether the deployment of the real-time service is complete. In the real-time service list, after the status of the newly deployed service changes from Deploying to Running, the service is deployed successfully.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback