Concepts

RAG

Retrieval-augmented generation (RAG) is a technique that enhances large language models (LLMs) by integrating them with external knowledge bases. RAG leverages non-training data (such as up-to-date information and internal documents) to enhance the relevance and accuracy of AI-generated responses. At its core, RAG is about supplying LLMs with reliable knowledge retrieved through vector search to mitigate LLM hallucinations, enabling them to generate more reliable, knowledge-backed outputs.

Embedding model

An embedding model transforms text (such as words, phrases, or sentences) into dense vector representations (N-dimensional arrays) in a continuous vector space where semantic similarity can be measured through spatial distances. The model can support downstream tasks such as similarity search and semantic reasoning.

Reranking model

A reranking model reranks an initial result set by performing deep semantic matching, thus enhancing search result relevance. It operates by rescoring the top-k results retrieved during the recall phase (for example, through an embeddings-based vector search), and returning a refined subset of the most semantically relevant results.

Search planning

Search planning consists of two parts: multi-turn query rewriting and intent classification.

With multi-turn query rewriting, an LLM rewrites user queries based on the chat history and generates new queries with clearer intents. It can also break down complex query questions into multiple simpler questions.
Intent classification refers LLMs' ability to accurately identify users' query intents.

Previous topic: Related Services