Updated on 2025-11-04 GMT+08:00

N-Gram Speculation

What Is N-Gram Speculation?

N-Gram speculation is a technique used to optimize inference performance, primarily for accelerating the generation process of large language models (LLMs). It predicts subsequent tokens using N-Gram matching, reducing the computational load of the model and thus improving generation speed.

Scenarios

Applicable Scenarios

  • Long text generation, such as stories and code completion.
  • High repetitiveness tasks, such as batch question answering and translation.

Inapplicable Scenario

  • Short text generation, for which the benefits of speculation are not significant.
  • High randomness tasks, such as creative writing, where the N-Gram matching rate is low.