Updated on 2025-06-16 GMT+08:00

Creating an Inference Endpoint

Before creating an inference service, you need to create an inference endpoint. When creating an inference endpoint, you can configure the maximum number of resources. Then, you can create inference services on the inference endpoint. The total number of resources of all inference services on the inference endpoint cannot exceed the maximum number of resources of the inference endpoint. This helps you control the resource usage of the inference endpoint.

Prerequisites

  • You have a valid Huawei Cloud account.
  • You have at least one workspace available.

Procedure

  1. Log in to Workspace Management Console.
  2. Select a created workspace, click Access Workspace, and choose Resources and Assets > Inference Endpoint.
  3. Click Create Inference Endpoint in the upper right corner. Enter the endpoint name, description, resource specifications, and quantity by referring to Table 1, and click Create Now.

    Table 1 Basic information about creating an inference endpoint

    Parameter

    Description

    Endpoint Name

    Indicates the name of an inference endpoint, which is mandatory.

    The name contains 1 to 64 characters and must be unique.

    Only letters, digits, underscores (_), hyphens (-), periods (.), and spaces are allowed.

    Description

    Indicates the description of an inference service, which is optional.

    The value contains 0 to 1,024 characters. Special characters such as ^!<>=&"' are not supported.

    Compute Unit Type

    This parameter is used to filter resource specifications.

    Resource Specifications

    Indicates the resource specifications, which is mandatory. Different resource specifications support different models.

    Pre-warmed Resources

    Currently, only 0 is supported, which is the number of pre-warmed resources of the inference endpoint.

    Maximum Number of Resources

    Indicates the maximum number of resources of an inference endpoint, which is mandatory. The value ranges from 1 to 1,000. In addition, the maximum number of resources cannot be less than the number of pre-warmed resources.

  4. Choose Resources and Assets > Inference Endpoint > My Endpoint to view the created endpoint.