Updated on 2025-06-16 GMT+08:00

Using an Inference Service for Inference

After an inference service is deployed, you can select an existing inference service in the playground for inference or call APIs for inference. For details, see the API document (API reference). The following describes how to use the playground for inference:

Prerequisites

  • You have a valid Huawei Cloud account.
  • You have at least one workspace available.
  • You have created an inference service.

Procedure

  1. Log in to Workspace Management Console.
  2. Select the created workspace and click Access Workspace. In the navigation pane on the left, choose Development and Production > Playgrounds.
  3. Click Playgrounds. The Playgrounds page is displayed.
  4. (Optional) Adjust parameters.

    If you need to adjust some inference parameters, click Advanced Configuration to adjust parameters such as max_tokens. The following lists the parameters.
    Table 1 Inference parameters

    Parameter

    Description

    max_tokens

    The maximum number of tokens to be generated during the chat. The value varies depending on different public inference services. For details, see the introduction to public inference services.

    temperature

    A number used to adjust randomness. The value ranges from 0 to 2. A larger value (for example, 0.8) leads to a more random output, while a smaller value (for example, 0.2) results in a more centralized and deterministic output.

    top_p

    The nucleus sampling strategy, which is used to control the range of tokens the AI model considers based on the cumulative probability.

    frequency_penalty

    The frequency penalty, which is used to control the repetition of words in the text to avoid frequent occurrence of some words or phrases in the generated text. The value ranges from –2.0 to 2.0. Positive values penalize new tokens based on how often they have appeared in the existing text, reducing the likelihood that the model repeats the same line verbatim.

    presence_penalty

    The presence penalty, which is used to control the repetition of topics in the text, avoiding repeated discussions of the same topic or viewpoint in the dialogue or text. The value ranges from –2.0 to 2.0. Positive values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics.

  5. (Optional) Compare multiple inferences.

    You can click Compare in the upper right corner to compare multiple inference services. A maximum of three inference services can be compared.