Using the GRAPH Algorithm to Implement Vector Search
This topic describes how to use the GRAPH algorithm to implement high-recall, low-latency vector search with a CSS vector database.
Scenarios
In big data processing and search scenarios, it is often essential to quickly retrieve the records most similar to a query vector from a large dataset. However, traditional search methods frequently fail to deliver the required high recall rate and low latency (or a desirable balance between the two). The GRAPH algorithm (a deeply optimized implementation of HNSW) enables vector search featuring high recall and low latency based on a CSS vector database powered by a memory-optimized cluster. It also supports hybrid queries that combine both vector similarity search and scalar field filters, making it well-suited for workloads that demand high-precision similarity matching, such as image retrieval, recommendation systems, and semantic search.
Solution Procedure
- Creating vector indexes: The GRAPH algorithm is used to create vector indexes. Optimizations like edge cutting, connectivity enhancement, and SIMD acceleration are supported.
- Associating with scalar fields: Hybrid queries that combine both vector similarity search and scalar field filters (such as labels and classes) are supported.
- Query engine: Enables efficient vector similarity search.
Highlights
- Enhanced performance: Compared with open-source algorithms, optimizations like edge cutting, connectivity enhancement, and SIMD acceleration enhance query performance and recall.
- Flexible filtering: Hybrid queries that combine both vector and scalar fields enhance search accuracy.
- Higher accuracy: The HNSW-optimized implementation enables more accurate vector similarity matching.
Constraints
- Indexes created using the GRAPH algorithm require resident memory. The optimal query performance can be achieved only when there is sufficient memory. For details about how to estimate the required memory capacity, see Memory Planning.
- Applicable cluster versions: Elasticsearch 7.6.2, Elasticsearch 7.10.2, OpenSearch 1.3.6, and OpenSearch 2.19.0.
Prerequisites
You have created a CSS vector database. An Elasticsearch 7.10.2 cluster is used as an example, and the cluster node flavor is Memory-optimized.
Procedure
- Log in to Kibana and go to the command execution page. Elasticsearch clusters support multiple access methods. This topic uses Kibana integrated by CSS as an example to describe the operation procedures.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the cluster list, find the target cluster, and click Kibana in the Operation column to log in to the Kibana console.
- In the left navigation pane, choose Dev Tools.
The left part of the console is the command input box, and the triangle icon in its upper-right corner is the execution button. The right part shows the execution result.
- Create a GRAPH vector index.
Create an index named my_index that contains a vector field my_vector and a label field my_label.
PUT my_index { "settings": { "index": { "vector": true } }, "mappings": { "properties": { "my_vector": { "type": "vector", "dimension": 2, "indexing": true, "algorithm": "GRAPH", "metric": "euclidean" }, "my_label": { "type": "keyword" } } } } - Ingest vector data.
Run the following command to write the full vector data to the new GRAPH index.
- Write a single record:
POST my_index/_doc { "my_vector": [1.0, 2.0], "my_label": "red" } - Write multiple records at the same time:
POST my_index/_bulk {"index": {}} {"my_vector": [1.0, 2.0], "my_label": "red"} {"index": {}} {"my_vector": [2.0, 2.0], "my_label": "green"} {"index": {}} {"my_vector": [2.0, 3.0], "my_label": "red"}
- Write a single record:
- Query vector data.
Run the following command to perform a vector search:
- Pure vector similarity search:
POST my_index/_search { "size":3, "_source": {"excludes": ["my_vector"]}, "query": { "vector": { "my_vector": { "vector": [1, 1], "topk":3 } } } } - Vector+scalar hybrid search (pre-filtering query):
POST my_index/_search { "size":3, "_source": {"excludes": ["my_vector"]}, "query": { "vector": { "my_vector": { "vector": [1, 1], "topk":3, "filter": { "term": { "my_label": "red" } } } } } }
If the result is returned, the query is successful.
- Pure vector similarity search:
Related Documents
- For more information about the CSS vector database, see About Vector Search.
- To learn how to quickly get started with CSS's vector search service, see Using Elasticsearch for Vector Search.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot