N-Gram Speculation
What Is N-Gram Speculation?
N-Gram speculation is a technique used to optimize inference performance, primarily for accelerating the generation process of large language models (LLMs). It predicts subsequent tokens using N-Gram matching, reducing the computational load of the model and thus improving generation speed.
Scenarios
Applicable Scenarios
- Long text generation, such as stories and code completion.
- High repetitiveness tasks, such as batch question answering and translation.
Inapplicable Scenario
- Short text generation, for which the benefits of speculation are not significant.
- High randomness tasks, such as creative writing, where the N-Gram matching rate is low.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot