Creating an Inference Service

When performing inference on DataArtsFabric, you can select an existing public inference service or deploy your own inference service.

Before deploying an inference service on DataArtsFabric, you need to have a model. You can use the model you created earlier. For ease of operation, DataArtsFabric provides some open-source public models by default. The following table lists the models.

**Table 1** Public models
Model Name	Overview	Base Model Type	Compute (MU)	Maximum Context Length	Prompt Template Length	Maximum Output Tokens
Qwen 2 72B Instruct	With 72 billion parameters, Qwen2 outperforms most previous open-weight models in multiple benchmark tests in terms of language understanding, generation, multilingual capabilities, coding, mathematics, and inference. It is also competitive with proprietary models.	QWEN_2_72B	8	16,000	23	16,360
Glm 4 9B Chat	GLM-4-9B is an open-source version of the latest-generation pre-trained GLM-4 series models launched by Zhipu AI. With 9 billion parameters, it has high performance in datasets evaluation in terms of semantics, mathematics, inference, code, and knowledge.	GLM_4_9B	2	32,000	16	32,751

The prompt template length is that of the system prompt. No matter what you enter, the system adds the prompt template to the input. The maximum context length is the sum of the prompt template length, maximum input token length, and maximum output token length.

You can view public model information in the model navigation pane. You can use public models to deploy inference services, but cannot delete public models.

Notes and Constraints

The common constraints on deploying inference services are as follows:

The resource specifications of the inference service range from 1 to 100.
The maximum number of inference service resources selected for the inference endpoint cannot exceed the maximum number of resources of the inference endpoint.

Prerequisites

You have a valid Huawei Cloud account.
You have at least one workspace available. For details, see Creating a Workspace.
You have created an inference endpoint. For details, see Creating an Inference Endpoint.
You have created a model for inference. For details, see Creating a Model.

Procedure

Log in to Workspace Management Console.
Select the created workspace and click Access Workspace. In the navigation pane on the left, choose Development and Production > Inference Services.
In the upper right corner of the My Inference Services tab page on the Inference Services page, click Create Inference Service.

Enter the basic information such as the name and description of the inference service to be created, and select the inference endpoint and model. You can click Public models or My models to select a model. Then, configure the minimum and maximum values of resources. For details, see the following table.

**Table 2** Parameters for creating an inference service
Parameter		Mandatory (Yes/No)	Description
Basic Settings	Name	Yes	Indicates the inference service name. The name contains 1 to 64 characters and must be unique. Only letters, digits, underscores (_), hyphens (-), periods (.), and spaces are allowed.
	Description	No	Indicates the description of an inference service. The value contains 0 to 1,024 characters. Special characters such as ^!<>=&"' are not supported.
	Model Type	Yes	You can select My models or Public models.
	Models	Yes	If Model Type is set to My models, select a model you have created from the drop-down list. For details about how to create a model, see Creating a Model. If Model Type is set to Public models, select a public inference service from the drop-down list.
	Model Version	Yes	If Model Type is set to My models, select a version of a model you have created from the drop-down list.
	Endpoint	Yes	Select an inference endpoint you have created from the drop-down list box. For details about how to create an inference endpoint, see Creating an Inference Endpoint.
Instance Running Settings	Resource Specifications	Yes	Indicates the resource specifications, which must be the same as those of the inference endpoint. Otherwise, the specifications are not supported.
	Minimum Value	Yes	Indicates the minimum number of instances of an inference service, which is created even if there is no request. The value ranges from 1 to 100. The inference service automatically scales in or out between the minimum and maximum number of instances based on the request load.
	Maximum Value	Yes	Indicates the maximum number of instances of an inference service. The value ranges from 1 to 100. In addition, the maximum value cannot be less than the minimum value, and the maximum value must be less than or equal to the maximum number of resources of the selected inference endpoint. The total maximum number of resources of all inference services under the same inference endpoint must be less than or equal to the maximum number of resources of the selected inference endpoint. The inference service automatically scales in or out between the minimum and maximum number of instances based on the request load. After the request is submitted, the number of instances of the inference service does not exceed the maximum value.