Updated on 2025-05-08 GMT+08:00

Pangu Large Models

The Pangu Large Models connector is used to connect to the Huawei Cloud PanguLM.

PanguLM is an all-in-one platform for large model development and deployment, integrating data management, model training, and deployment. It supports custom model development and offers a full lifecycle toolchain to help developers build and deploy models efficiently. Enterprises can easily choose the right services and products to develop models and applications flexibly.

Creating a PanguLM Connection

  1. Log in to the new ROMA Connect console.
  2. In the navigation pane on the left, choose Connector. On the page displayed, click New Connection.
  3. Select the PanguLM connector.
  4. In the dialog box displayed, configure the connector and click OK.

    Parameter

    Description

    Name

    Enter the connector instance name.

    App Key

    Access key ID (AK) of the current account. Obtain the AK by referring to Access Keys. If an AK/SK pair has been generated, find the downloaded AK/SK file (such as credentials.csv).

    App Secret

    Secret access key (SK) of the current account. Obtain the SK by referring to Access Keys. If an AK/SK pair has been generated, find the downloaded AK/SK file (such as credentials.csv).

    Description

    Enter the description of the connector to identify it.

Action

  • Q&A
  • General text

Configuration Parameters

Table 1 Q&A

Parameter

Description

Content-Type

Request body MIME type.

project_id

Project ID.

deployment_id

Deployment ID of a model.

region_id

Region ID.

messages

Multi-turn dialogue.

role

Role.

content

Question-answer pair content.

user

Unique identifier of a customer. The value contains 1 to 64 characters.

stream

Whether to enable streaming mode.

  • true: enabled (streaming SDK required)
  • false (default): disabled

temperature

Diversity and creativity of generated text. The value ranges from 0 to 1, and 0 indicates the lowest diversity. Generally, a lower temperature is suitable for deterministic tasks, while a higher temperature, such as 0.9, is suitable for creative tasks. temperature is one of the key parameters that affect the output quality and diversity of an LLM. Other parameters, like top_p, can also be used to adjust the behavior and preferences of the LLM. However, do not use these parameters at the same time.

top_p

An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. Therefore, 0.1 means that only the tokens comprising the top 10% probability mass are considered. It is recommended that you use top_p or temperature to adjust the tendency of generated text, but do not modify the parameters at the same time.

max_tokens

Length and quality of chat replies. Generally, a large max_tokens value can generate a long and complete reply, but may also increase the risk of generating irrelevant or duplicate content. A small max_tokens value can generate short and concise replies, but may also cause incomplete or discontinuous content. Therefore, you need to select a proper max_tokens value based on scenarios and requirements. The value ranges from 1 to 2048, and the default value is 16.

n

Number of answers generated for each question. The default value is 1, indicating that only one answer is generated. If you want multiple answers, set this parameter to an integer greater than 1, for example, 2. In this way, the API returns an array containing two answers. The value can be 1 (default) or 2.

presence_penalty

Penalty given to repetition in the generated text. Positive presence penalty values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics. This parameter helps to make the output more creative and diverse by reducing the likelihood of repetition. The value ranges from -2 to 2.

Table 2 General text

Parameter

Description

Content-Type

Request body MIME type.

project_id

Project ID.

deployment_id

Deployment ID of a model.

region_id

Region ID.

prompt

Input text, which can contain 1 to 4,096 characters.

user

Unique identifier of a customer. The value contains 1 to 64 characters.

stream

Whether to enable streaming mode.

  • true: enabled (streaming SDK required)
  • false (default): disabled

temperature

Diversity and creativity of generated text. The value ranges from 0 to 1, and 0 indicates the lowest diversity. Generally, a lower temperature is suitable for deterministic tasks, while a higher temperature, such as 0.9, is suitable for creative tasks. temperature is one of the key parameters that affect the output quality and diversity of an LLM. Other parameters, like top_p, can also be used to adjust the behavior and preferences of the LLM. However, do not use these parameters at the same time.

top_p

An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. Therefore, 0.1 means that only the tokens comprising the top 10% probability mass are considered. It is recommended that you use top_p or temperature to adjust the tendency of generated text, but do not modify the parameters at the same time.

max_tokens

Length and quality of chat replies. Generally, a large max_tokens value can generate a long and complete reply, but may also increase the risk of generating irrelevant or duplicate content. A small max_tokens value can generate short and concise replies, but may also cause incomplete or discontinuous content. Therefore, you need to select a proper max_tokens value based on scenarios and requirements. The value ranges from 1 to 2048, and the default value is 16.

n

Number of answers generated for each question. The default value is 1, indicating that only one answer is generated. If you want multiple answers, set this parameter to an integer greater than 1, for example, 2. In this way, the API returns an array containing two answers. The value can be 1 (default) or 2.

presence_penalty

Penalty given to repetition in the generated text. Positive presence penalty values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics. This parameter helps to make the output more creative and diverse by reducing the likelihood of repetition. The value ranges from -2 to 2.