Updated on 2025-07-08 GMT+08:00

Creating an Inference Service

When performing inference on DataArtsFabric, you can select an existing public inference service or deploy your own inference service.

Before deploying an inference service on DataArtsFabric, you need to have a model. You can use the model you created earlier. For ease of operation, DataArtsFabric provides some open-source public models by default. The following table lists the models.

Table 1 Public models

Model Name

Overview

Base Model Type

Compute (MU)

Maximum Context Length

Prompt Template Length

Maximum Output Tokens

Qwen 2 72B Instruct

With 72 billion parameters, Qwen2 outperforms most previous open-weight models in multiple benchmark tests in terms of language understanding, generation, multilingual capabilities, coding, mathematics, and inference. It is also competitive with proprietary models.

QWEN_2_72B

8

16,000

23

16,360

Glm 4 9B Chat

GLM-4-9B is an open-source version of the latest-generation pre-trained GLM-4 series models launched by Zhipu AI. With 9 billion parameters, it has high performance in datasets evaluation in terms of semantics, mathematics, inference, code, and knowledge.

GLM_4_9B

2

32,000

16

32,751

The prompt template length is that of the system prompt. No matter what you enter, the system adds the prompt template to the input. The maximum context length is the sum of the prompt template length, maximum input token length, and maximum output token length.

You can view public model information in the model navigation pane. You can use public models to deploy inference services, but cannot delete public models.

Notes and Constraints

The common constraints on deploying inference services are as follows:

  • The resource specifications of the inference service range from 1 to 100.
  • The maximum number of inference service resources selected for the inference endpoint cannot exceed the maximum number of resources of the inference endpoint.

Prerequisites

Procedure

  1. Log in to Workspace Management Console.
  2. Select the created workspace and click Access Workspace. In the navigation pane on the left, choose Development and Production > Inference Services.
  3. In the upper right corner of the My Inference Services tab page on the Inference Services page, click Create Inference Service.
  4. Enter the basic information such as the name and description of the inference service to be created, and select the inference endpoint and model. You can click Public models or My models to select a model. Then, configure the minimum and maximum values of resources. For details, see the following table.

    Table 2 Parameters for creating an inference service

    Parameter

    Mandatory (Yes/No)

    Description

    Basic Settings

    Name

    Yes

    Indicates the inference service name.

    The name contains 1 to 64 characters and must be unique. Only letters, digits, underscores (_), hyphens (-), periods (.), and spaces are allowed.

    Description

    No

    Indicates the description of an inference service.

    The value contains 0 to 1,024 characters. Special characters such as ^!<>=&"' are not supported.

    Model Type

    Yes

    You can select My models or Public models.

    Models

    Yes

    • If Model Type is set to My models, select a model you have created from the drop-down list. For details about how to create a model, see Creating a Model.
    • If Model Type is set to Public models, select a public inference service from the drop-down list.

    Model Version

    Yes

    If Model Type is set to My models, select a version of a model you have created from the drop-down list.

    Endpoint

    Yes

    Select an inference endpoint you have created from the drop-down list box. For details about how to create an inference endpoint, see Creating an Inference Endpoint.

    Instance Running Settings

    Resource Specifications

    Yes

    Indicates the resource specifications, which must be the same as those of the inference endpoint. Otherwise, the specifications are not supported.

    Minimum Value

    Yes

    Indicates the minimum number of instances of an inference service, which is created even if there is no request. The value ranges from 1 to 100. The inference service automatically scales in or out between the minimum and maximum number of instances based on the request load.

    Maximum Value

    Yes

    Indicates the maximum number of instances of an inference service. The value ranges from 1 to 100. In addition, the maximum value cannot be less than the minimum value, and the maximum value must be less than or equal to the maximum number of resources of the selected inference endpoint. The total maximum number of resources of all inference services under the same inference endpoint must be less than or equal to the maximum number of resources of the selected inference endpoint. The inference service automatically scales in or out between the minimum and maximum number of instances based on the request load. After the request is submitted, the number of instances of the inference service does not exceed the maximum value.

  5. After the configuration is complete, click Create Now.
  6. On the Inference Services page, you can view the created inference service.