Pangu Large Models

The Pangu Large Models connector is used to connect to the Huawei Cloud PanguLM.

PanguLM is an all-in-one platform for large model development and deployment, integrating data management, model training, and deployment. It supports custom model development and offers a full lifecycle toolchain to help developers build and deploy models efficiently. Enterprises can easily choose the right services and products to develop models and applications flexibly.

Creating a PanguLM Connection

Log in to the new ROMA Connect console.
In the navigation pane on the left, choose Connector. On the page displayed, click New Connection.
Select the PanguLM connector.

In the dialog box displayed, configure the connector and click OK.

Parameter	Description
Name	Enter the connector instance name.
App Key	Access key ID (AK) of the current account. Obtain the AK by referring to Access Keys. If an AK/SK pair has been generated, find the downloaded AK/SK file (such as credentials.csv).
App Secret	Secret access key (SK) of the current account. Obtain the SK by referring to Access Keys. If an AK/SK pair has been generated, find the downloaded AK/SK file (such as credentials.csv).
Description	Enter the description of the connector to identify it.

Action

Q&A
General text

Configuration Parameters

**Table 1** Q&A
Parameter	Description
Content-Type	Request body MIME type.
project_id	Project ID.
deployment_id	Deployment ID of a model.
region_id	Region ID.
messages	Multi-turn dialogue.
role	Role.
content	Question-answer pair content.
user	Unique identifier of a customer. The value contains 1 to 64 characters.
stream	Whether to enable streaming mode. true: enabled (streaming SDK required) false (default): disabled
temperature	Diversity and creativity of generated text. The value ranges from 0 to 1, and 0 indicates the lowest diversity. Generally, a lower temperature is suitable for deterministic tasks, while a higher temperature, such as 0.9, is suitable for creative tasks. temperature is one of the key parameters that affect the output quality and diversity of an LLM. Other parameters, like top_p, can also be used to adjust the behavior and preferences of the LLM. However, do not use these parameters at the same time.
top_p	An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. Therefore, 0.1 means that only the tokens comprising the top 10% probability mass are considered. It is recommended that you use top_p or temperature to adjust the tendency of generated text, but do not modify the parameters at the same time.
max_tokens	Length and quality of chat replies. Generally, a large max_tokens value can generate a long and complete reply, but may also increase the risk of generating irrelevant or duplicate content. A small max_tokens value can generate short and concise replies, but may also cause incomplete or discontinuous content. Therefore, you need to select a proper max_tokens value based on scenarios and requirements. The value ranges from 1 to 2048, and the default value is 16.
n	Number of answers generated for each question. The default value is 1, indicating that only one answer is generated. If you want multiple answers, set this parameter to an integer greater than 1, for example, 2. In this way, the API returns an array containing two answers. The value can be 1 (default) or 2.
presence_penalty	Penalty given to repetition in the generated text. Positive presence penalty values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics. This parameter helps to make the output more creative and diverse by reducing the likelihood of repetition. The value ranges from -2 to 2.

**Table 2** General text
Parameter	Description
Content-Type	Request body MIME type.
project_id	Project ID.
deployment_id	Deployment ID of a model.
region_id	Region ID.
prompt	Input text, which can contain 1 to 4,096 characters.
user	Unique identifier of a customer. The value contains 1 to 64 characters.
stream	Whether to enable streaming mode. true: enabled (streaming SDK required) false (default): disabled
temperature	Diversity and creativity of generated text. The value ranges from 0 to 1, and 0 indicates the lowest diversity. Generally, a lower temperature is suitable for deterministic tasks, while a higher temperature, such as 0.9, is suitable for creative tasks. temperature is one of the key parameters that affect the output quality and diversity of an LLM. Other parameters, like top_p, can also be used to adjust the behavior and preferences of the LLM. However, do not use these parameters at the same time.
top_p	An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. Therefore, 0.1 means that only the tokens comprising the top 10% probability mass are considered. It is recommended that you use top_p or temperature to adjust the tendency of generated text, but do not modify the parameters at the same time.
max_tokens	Length and quality of chat replies. Generally, a large max_tokens value can generate a long and complete reply, but may also increase the risk of generating irrelevant or duplicate content. A small max_tokens value can generate short and concise replies, but may also cause incomplete or discontinuous content. Therefore, you need to select a proper max_tokens value based on scenarios and requirements. The value ranges from 1 to 2048, and the default value is 16.
n	Number of answers generated for each question. The default value is 1, indicating that only one answer is generated. If you want multiple answers, set this parameter to an integer greater than 1, for example, 2. In this way, the API returns an array containing two answers. The value can be 1 (default) or 2.
presence_penalty	Penalty given to repetition in the generated text. Positive presence penalty values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics. This parameter helps to make the output more creative and diverse by reducing the likelihood of repetition. The value ranges from -2 to 2.