How Do I Adjust Inference Parameters to Maximize the Pangu Model Performance?

The inference parameters (decoding parameters) are used to control the styles of the model's predictions, for example, the length, randomness, creativity, diversity, accuracy, and richness.

Currently, the platform supports the following inference parameters: temperature, top_p, and presence_penalty. The following table lists the recommended values and descriptions of these parameters.

**Table 1** Inference parameter descriptions
Inference Parameter	Value Range	Recommended Value	Description
temperature	0 to 1	0.3	It is mainly used to control the randomness and creativity of model outputs. A high temperature leads to high randomness and creativity of the outputs. A lower temperature enables the outputs to be more predictable and more deterministic. You can adjust the temperature based on the actual task type. Generally, if a target task requires the model to generate more creative content, a higher temperature can be used. If a target task requires the model to generate more deterministic content, a lower temperature can be used. Note that the functions of temperature and top_p are similar. In actual applications, to conveniently identify the tuning effect of either parameter, you are not advised adjusting them at the same time. If you lack professional optimization experience, you can use the recommended values first and then adjust the settings based on the inference effects.
top_p	0 to 1	1	It is mainly used to control the diversity of model outputs. A larger top_p value can generate more diverse outputs. A lower top_p value enables the outputs to be more predictable and more deterministic. You can adjust the temperature based on the actual task type. Generally, if a target task needs to generate more diverse content, a higher top_p value may be used. On the contrary, if a target task needs to generate more deterministic content, a lower top_p value may be used. Note that the functions of temperature and top_p are similar. In actual applications, to conveniently identify the tuning effect of either parameter, you are not advised adjusting them at the same time. If you lack professional optimization experience, you can use the recommended values first and then adjust the settings based on the inference effects.
presence_penalty	-2 to 2	0	It is used to control the model's tendency to repeat the same phrases or words in the generated output. If the parameter is set to a positive value, the model tends to generate new text that has never appeared in the output. If the parameter is set to a negative value, the model tends to stay focused on particular content. If you lack professional optimization experience, you can use the recommended values first and then adjust the settings based on the inference effects.

The following lists some common scenarios and parameter adjustment guides to help you better understand the functions of these parameters:

Text generation: The generated text (such as promotional copies, letter text, and literary) is expected to be diverse. You are advised to increase the value of temperature or top_p while ensuring that the text is not too random. If the generated text is too divergent, reduce the presence_penalty value to ensure consistent content. On the contrary, if the generated text is too monotonous or even repetitive, increase the presence_penalty value.
Knowledge base question answering (KBQA): In these scenarios, such as open-book QA and retrieval-based QA, the answers should be deterministic and unique. You are advised to reduce the value of temperature or top_p. To enable a model to generate the same answer for the same input each time, set temperature to 0.