About CSS Vector Search
In the age of AI, unstructured data—such as images, videos, audios, and text—is growing rapidly. Traditional keyword-based search cannot effectively handle unstructured data because it cannot capture deep semantic or visual features. To enable effective search across unstructured data, CSS provides a vector search solution that enables high-performance, high-accuracy nearest neighbor or approximate nearest neighbor search. Typical use cases include image search, video search, similar product recommendations, semantic text search, and cross-modal search (for example, searching for images using text).
Advantages
- Efficient and reliable: The built-in CSS vector search engine provides high search throughput and low latency. Based on the OpenSearch native distributed architecture, CSS delivers enterprise-grade reliability guarantees, including multi-replica, snapshots, and permission control.
- Flexible adaptation: CSS supports multiple indexing algorithms, including brute-force search, graph-based algorithms (such as HNSW), product quantization (PQ), and IVF-HNSW; as well as multiple similarity measurement methods, including Euclidean, inner product, cosine, and Hamming.
- Elasticsearch ecosystem compatibility: CSS vector search is compatible with open-source OpenSearch query language and APIs. It seamlessly integrates OpenSearch ecosystem tools, such as Cerebro, Kibana, and Logstash; and it supports mainstream client languages such as Python, Java, Go, and C++, simplifying development and integration.
How It Works
CSS vector search uses approximate nearest neighbor (ANN) search to mitigate the intensive computational load of k-Nearest Neighbors (k-NN) search, perfectly balancing search efficiency and accuracy. Key points include:
- Reducing the candidate set: Traditional text search filters out irrelevant documents through inverted indexes, while vector search quickly retrieves potentially relevant vectors using index structures like HNSW graphs or IVF-PQ, avoiding scanning the full dataset. For example, HNSW uses a multi-layer structure to quickly find nearest neighbors to the query vector.
- Reducing computational complexity: The funnel model first performs coarse quantization (such as IVF-PQ) on vectors to quickly obtain the candidate set. Then, it performs fine retrieval or reranking (such as cosine similarity) on the candidate set. Quantization uses the product quantization (PQ) algorithm to encode high-dimensional vectors into compact codes, reducing storage and computational overheads.
- Balancing performance and accuracy: Indexing parameters (such as the number of HNSW layers and the number of IVF clusters) can be dynamically tuned to trade off between recall and query latency.
Procedure
- Data preparation: Use an AI model (such as CNN and Transformer) to process your unstructured data (such as images, videos, and text) and extract feature vectors.
- Index creation: Create vector indexes in your OpenSearch cluster and define vector field mappings, including specifying vector dimensions, indexing algorithms, and similarity measurement methods.
- Data write: Store feature vectors (typically along with the original data or metadata) into these indexes.
- Vector search: Use the standard OpenSearch query DSL (such as KNN query) to provide the query vector (generated by the same model), and specify the number (k) of nearest neighbors you expect to return.
- Result: The CSS vector search engine performs an efficient ANN search and returns the k most relevant results and their similarity scores. Your application can then process these results (for example, showing similar images or recommending relevant products).
Constraints
The built-in CSS vector search engine is available for OpenSearch 1.3.6 clusters only.
Related Documents
- CSS's OpenSearch clusters support the open-source vector search. For details, see k-NN query.
- To learn how to quickly get started with CSS's vector search service, see Using Elasticsearch for Vector Search.
- To learn more about the CSS vector database performance, see Testing the Performance of CSS's Elasticsearch Vector Search.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot