Basic Concepts

Concepts Related to Large Models

Concept	Description
LLM	Large language models (LLMs) are a category of foundation models pretrained on immense amounts of data. A pre-trained model means that the model is trained on an original task and continuously fine-tuned on downstream tasks. This improves the model accuracy on downstream tasks. A large-scale pre-trained model is a pre-trained model whose model parameters reach a level of 100 billion or 1 trillion. These models have stronger generalization capabilities and can accumulate industry experience and obtain information more efficiently and accurately.
Token	A token is the smallest unit of text a model can work with. A token can be a word or part of characters. An LLM converts input and output text into tokens, generates a probability distribution for each possible word, and then samples tokens according to the distribution. Some compound words are split based on semantics. For example, overweight is made up of two tokens: "over" and "weight".

Concept

Description

LLM

Large language models (LLMs) are a category of foundation models pretrained on immense amounts of data. A pre-trained model means that the model is trained on an original task and continuously fine-tuned on downstream tasks. This improves the model accuracy on downstream tasks. A large-scale pre-trained model is a pre-trained model whose model parameters reach a level of 100 billion or 1 trillion. These models have stronger generalization capabilities and can accumulate industry experience and obtain information more efficiently and accurately.

Token

A token is the smallest unit of text a model can work with. A token can be a word or part of characters. An LLM converts input and output text into tokens, generates a probability distribution for each possible word, and then samples tokens according to the distribution.

Some compound words are split based on semantics. For example, overweight is made up of two tokens: "over" and "weight".

**Table 1** Word-to-token ratios
Model Specifications	English Word-to-Token Ratio	Chinese Character-to-Token Ratio
N1 series models	0.75	1.5
N2 series models	0.88	1.24
N4 series models	0.75	1.5

Inference

**Table 2** Inference-related concepts
Concept	Description
Temperature	The temperature parameter controls the randomness and creativity of the generated text in a generative language model. It is used to adjust the probabilities of the predicted words in the softmax output layer of the model. Higher temperature indicates a smaller variance of probabilities of the predicted words. That is, there is a higher probability that many words are more likely to be selected, facilitating the diversity of the generated text.
Diversity and consistency	Diversity and consistency are two important dimensions of evaluating text generated by LLMs, which affect the model's generalization capability and stability, respectively. Diversity refers to the variation among different outputs generated by a model. It can be enhanced through collecting large amounts of data, data augmentation, and multilingual training. Consistency refers to the alignment across different outputs generated from the same input. It can be improved through regularization and parameter tuning.
Repetition penalty	Repetition penalty is a technique used in model training or text generation. It discourages the repetition of tokens that have appeared recently in the generated text. This is done by adding a penalty for repetitive output during loss calculation. (The loss function is essential for model optimization.) If the model generates repetitive tokens, its loss will increase, which encourages the model to produce more diverse tokens.

Prompt Engineering

**Table 3** Concepts related to prompt engineering
Concept	Description
Prompt	A prompt is a language used to interact with an AI model, indicating the content needed for model generation. Prompts enable the use of LLMs across diverse applications and research fields. A prompt can contain information like the instructions or question you are passing to the model and can include other details such as additional context, inputs, or examples.
CoT	Chain-of-thought (CoT) is a method that simulates human problem-solving. It uses a series of natural language inference processes to gradually deduce a problem from the input to the final conclusion. This process can be seen as a chain, where each link represents a stage in the model's processing and reasoning.
Self-Instruct	Self-Instruct is a method for aligning pre-trained language models with instructions. With Self-Instruct, language models are able to generate instruction data themselves without relying on extensive manual annotation. This method reduces reliance on manual instructions and improves the model's adaptability.