Creating an Inference Endpoint

Before creating an inference service, you need to create an inference endpoint. When creating an inference endpoint, you can configure the maximum number of resources. Then, you can create inference services on the inference endpoint. The total number of resources of all inference services on the inference endpoint cannot exceed the maximum number of resources of the inference endpoint. This helps you control the resource usage of the inference endpoint.

Prerequisites

You have a valid Huawei Cloud account.
You have at least one workspace available.

Procedure

Log in to Workspace Management Console.
Select a created workspace, click Access Workspace, and choose Resources and Assets > Inference Endpoint.

Click Create Inference Endpoint in the upper right corner. Enter the endpoint name, description, resource specifications, and quantity by referring to Table 1, and click Create Now.

**Table 1** Basic information about creating an inference endpoint
Parameter	Description
Endpoint Name	Indicates the name of an inference endpoint, which is mandatory. The name contains 1 to 64 characters and must be unique. Only letters, digits, underscores (_), hyphens (-), periods (.), and spaces are allowed.
Description	Indicates the description of an inference service, which is optional. The value contains 0 to 1,024 characters. Special characters such as ^!<>=&"' are not supported.
Compute Unit Type	This parameter is used to filter resource specifications.
Resource Specifications	Indicates the resource specifications, which is mandatory. Different resource specifications support different models.
Pre-warmed Resources	Currently, only 0 is supported, which is the number of pre-warmed resources of the inference endpoint.
Maximum Number of Resources	Indicates the maximum number of resources of an inference endpoint, which is mandatory. The value ranges from 1 to 1,000. In addition, the maximum number of resources cannot be less than the number of pre-warmed resources.