Creating an Inference Service
When performing inference on DataArtsFabric, you can select an existing public inference service or deploy your own inference service.
Before deploying an inference service on DataArtsFabric, you need to have a model. You can use the model you created earlier. For ease of operation, DataArtsFabric provides some open-source public models by default. The following table lists the models.
Model Name |
Overview |
Base Model Type |
Compute (MU) |
Maximum Context Length |
Prompt Template Length |
Maximum Output Tokens |
---|---|---|---|---|---|---|
Qwen 2 72B Instruct |
With 72 billion parameters, Qwen2 outperforms most previous open-weight models in multiple benchmark tests in terms of language understanding, generation, multilingual capabilities, coding, mathematics, and inference. It is also competitive with proprietary models. |
QWEN_2_72B |
8 |
16,000 |
23 |
16,360 |
Glm 4 9B Chat |
GLM-4-9B is an open-source version of the latest-generation pre-trained GLM-4 series models launched by Zhipu AI. With 9 billion parameters, it has high performance in datasets evaluation in terms of semantics, mathematics, inference, code, and knowledge. |
GLM_4_9B |
2 |
32,000 |
16 |
32,751 |
The prompt template length is that of the system prompt. No matter what you enter, the system adds the prompt template to the input. The maximum context length is the sum of the prompt template length, maximum input token length, and maximum output token length.
You can view public model information in the model navigation pane. You can use public models to deploy inference services, but cannot delete public models.
Notes and Constraints
The common constraints on deploying inference services are as follows:
- The resource specifications of the inference service range from 1 to 100.
- The maximum number of inference service resources selected for the inference endpoint cannot exceed the maximum number of resources of the inference endpoint.
Prerequisites
- You have a valid Huawei Cloud account.
- You have at least one workspace available. For details, see Creating a Workspace.
- You have created an inference endpoint. For details, see Creating an Inference Endpoint.
- You have created a model for inference. For details, see Creating a Model.
Procedure
- Log in to Workspace Management Console.
- Select the created workspace and click Access Workspace. In the navigation pane on the left, choose Development and Production > Inference Services.
- In the upper right corner of the My Inference Services tab page on the Inference Services page, click Create Inference Service.
- Enter the basic information such as the name and description of the inference service to be created, and select the inference endpoint and model. You can click Public models or My models to select a model. Then, configure the minimum and maximum values of resources. For details, see the following table.
Table 2 Parameters for creating an inference service Parameter
Mandatory (Yes/No)
Description
Basic Settings
Name
Yes
Indicates the inference service name.
The name contains 1 to 64 characters and must be unique. Only letters, digits, underscores (_), hyphens (-), periods (.), and spaces are allowed.
Description
No
Indicates the description of an inference service.
The value contains 0 to 1,024 characters. Special characters such as ^!<>=&"' are not supported.
Model Type
Yes
You can select My models or Public models.
Models
Yes
- If Model Type is set to My models, select a model you have created from the drop-down list. For details about how to create a model, see Creating a Model.
- If Model Type is set to Public models, select a public inference service from the drop-down list.
Model Version
Yes
If Model Type is set to My models, select a version of a model you have created from the drop-down list.
Endpoint
Yes
Select an inference endpoint you have created from the drop-down list box. For details about how to create an inference endpoint, see Creating an Inference Endpoint.
Instance Running Settings
Resource Specifications
Yes
Indicates the resource specifications, which must be the same as those of the inference endpoint. Otherwise, the specifications are not supported.
Minimum Value
Yes
Indicates the minimum number of instances of an inference service, which is created even if there is no request. The value ranges from 1 to 100. The inference service automatically scales in or out between the minimum and maximum number of instances based on the request load.
Maximum Value
Yes
Indicates the maximum number of instances of an inference service. The value ranges from 1 to 100. In addition, the maximum value cannot be less than the minimum value, and the maximum value must be less than or equal to the maximum number of resources of the selected inference endpoint. The total maximum number of resources of all inference services under the same inference endpoint must be less than or equal to the maximum number of resources of the selected inference endpoint. The inference service automatically scales in or out between the minimum and maximum number of instances based on the request load. After the request is submitted, the number of instances of the inference service does not exceed the maximum value.
- After the configuration is complete, click Create Now.
- On the Inference Services page, you can view the created inference service.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot