Updated on 2025-07-08 GMT+08:00

Performing Inference in the Playground

DataArtsFabric provides a playground for you to select inference services on the page for inference. The playground supports streaming inference, allows you to configure different inference parameters such as max_tokens, and supports comparison of different inference services.

Notes and Constraints

The common constraints on using public inference services are as follows:

  • Token quota: Each public inference service has a free quota. After the quota is used up, the service is unavailable and the tokens cannot be purchased again. You can share the quota of each public inference service in all of your workspaces at the current site.
  • Time: The validity period is 90 days from the date when the service is enabled. If the validity period expires, the service becomes invalid. If the same inference service is enabled in different workspaces, the time when the service was first enabled is used.
  • Different models have different context length constraints. For details, see Table 1.
  • Service level agreement (SLA) is not guaranteed. So, you can create your own inference services for enhanced inference performance.

Prerequisites

  • You have a valid Huawei Cloud account.
  • You have at least one workspace available.
  • You have enabled the public inference service. For details, see Enabling an Inference Service.

Procedure

  1. Log in to Workspace Management Console.
  2. Select the created workspace and click Access Workspace.
  3. In the navigation pane on the left, choose Inference Services > Public Inference Services.
  4. Click Playgrounds. The Playgrounds page is displayed for inference.
  5. (Optional) Adjust inference parameters.

    You can click Advanced Configuration to adjust certain inference parameters like max_tokens. The following table lists the parameters.

    Table 1 Inference parameters

    Parameter

    Description

    max_tokens

    The maximum number of tokens to be generated during the chat. The value varies depending on different public inference services. For details, see the introduction to public inference services.

    temperature

    A number used to adjust randomness. The value ranges from 0 to 2. A larger value (for example, 0.8) leads to a more random output, while a smaller value (for example, 0.2) results in a more centralized and deterministic output.

    top_p

    The nucleus sampling strategy, which is used to control the range of tokens the AI model considers based on the cumulative probability.

    frequency_penalty

    The frequency penalty, which is used to control the repetition of words in the text to avoid frequent occurrence of some words or phrases in the generated text. The value ranges from –2.0 to 2.0. Positive values penalize new tokens based on how often they have appeared in the existing text, reducing the likelihood that the model repeats the same line verbatim.

    presence_penalty

    The presence penalty, which is used to control the repetition of topics in the text, avoiding repeated discussions of the same topic or viewpoint in the dialogue or text. The value ranges from –2.0 to 2.0. Positive values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics.

  6. (Optional) Compare multiple inference services.

    DataArtsFabric also provides inference service comparison for you to compare multiple inference services. You can click Compare in the upper right corner to add a comparison of up to three inference services.