Updated on 2024-06-12 GMT+08:00

Deploying as a Real-Time Service

After an AI application is prepared, you can deploy it as a real-time service and call the service for prediction.

Constraints

A maximum of 20 real-time services can be deployed by a user.

Prerequisites

  • Data has been prepared. Specifically, you have created an AI application in the Normal state in ModelArts.

Procedure

  1. Log in to the ModelArts management console. In the left navigation pane, choose Service Deployment > Real-Time Services. The real-time service list is displayed by default.
  2. In the real-time service list, click Deploy in the upper left corner. The Deploy page is displayed.
  3. Set parameters for a real-time service.
    1. Set basic information about model deployment. For details about the parameters, see Table 1.
      Table 1 Basic parameters

      Parameter

      Description

      Name

      Enter a name for the real-time service.

      Description

      Enter a brief description for the real-time service.

    2. Enter key information including the resource pool and AI application configurations. For details, see Table 2.
      Table 2 Parameters

      Parameter

      Sub-Parameter

      Description

      Resource Pool

      Public Resource Pool

      CPU/GPU computing resources are available for you to select.

      Dedicated Resource Pool

      Select a specification from the dedicated resource pool specifications. The physical pools with logical subpools created are not supported temporarily.

      NOTE:
      • The data of old-version dedicated resource pools will be gradually migrated to the new-version dedicated resource pools.
      • For new users and the existing users who have migrated data from old-version dedicated resource pools to new ones, there is only one entry to new-version dedicated resource pools on the ModelArts management console.
      • For the existing users who have not migrated data from old-version dedicated resource pools to new ones, there are two entries to dedicated resource pools on the ModelArts management console, where the entry marked with New is to the new version.

      For more details about the new-version dedicated resource pools, see Comprehensive Upgrades to ModelArts Resource Pool Management Functions

      AI Application and Configuration

      AI Application Source

      Select My AI Applications based on your requirements.

      AI Application and Version

      Select the AI application and version that are in the Normal state.

      Streams

      Number of video streams that can be concurrently processed. This parameter is available only for asynchronous request models.

      Specifications

      Select available specifications based on the list displayed on the console. The specifications in gray cannot be used in the current environment.

      If specifications in the public resource pools are unavailable, no public resource pool is available in the current environment. In this case, use a dedicated resource pool or contact the administrator to create a public resource pool.

      NOTE:

      When the selected flavor is used to deploy the service, necessary system consumption is generated. Therefore, the resources actually occupied by the service are slightly greater than the selected flavor.

      Compute Nodes

      Set the number of instances for the current AI application version. If you set the number of nodes to 1, the standalone computing mode is used. If you set the number of nodes to a value greater than 1, the distributed computing mode is used. Select a computing mode based on the actual requirements.

      Timeout

      Timeout of a single model, including both the deployment and startup time. The default value is 20 minutes. The value must range from 3 to 120.

      Add AI Application Version and Configuration

      If the selected AI application has multiple versions, you can add multiple versions and configure a traffic ratio. You can use gray launch to smoothly upgrade the AI application version.

      NOTE:

      Free compute specifications do not support the gray launch of multiple versions.

      Mount Storage

      This function will mount a storage volume to compute nodes (compute instances) as a local directory when the service is running. It is recommended when the model or input data is large. There are two volume types: OBS parallel file system and SFS file system. Currently, only OBS parallel file systems are supported.

      • OBS parallel file system
        • Source Path: Select the storage path of the parallel file. A cross-region OBS parallel file system cannot be selected.
        • Mount Path: Enter the mount path of the container, for example, /tmp.
          • To avoid container exceptions, do not mount the storage to a system directory like / or /var/run.
          • It is a good practice to mount the container to an empty directory. If the directory is not empty, ensure that there are no files affecting container startup in the directory. Otherwise, such files will be replaced, resulting in failures to start the container and create the workload.
          • The mount path must start with a slash (/) and can contain a maximum of 1,024 characters, including letters, digits, and the following special characters: \ _ -.
      • SFS file system (not supported)
      NOTE:

      Storage mounting can be used only by services deployed in a dedicated resource pool.

    3. (Optional) Configure advanced settings.
      Table 3 Advanced settings

      Parameter

      Description

      Tags

      ModelArts can work with Tag Management Service (TMS). When creating resource-consuming tasks in ModelArts, for example, training jobs, configure tags for these tasks so that ModelArts can use tags to manage resources by group.

      For details about how to use tags, see "How Does ModelArts Use Tags to Manage Resources by Group?" in ModelArts FAQs.

      NOTE:

      You can select a predefined TMS tag from the tag drop-down list or customize a tag. Predefined tags are available to all service resources that support tags. Customized tags are available only to the service resources of the user who has created the tags.

  4. After confirming the entered information, complete service deployment as prompted. Generally, service deployment jobs run for a period of time, which may be several minutes or tens of minutes depending on the amount of your selected data and resources.

    After a real-time service is deployed, it is started immediately.

    You can go to the real-time service list to check whether the deployment of the real-time service is complete. In the real-time service list, after the status of the newly deployed service changes from Deploying to Running, the service is deployed successfully.