Querying Vectors
Standard Query
Standard vector query syntax is provided for vector fields with vector indexes. The following command will return n (specified by size/topk) data records that are most close to the query vector.
POST my_index/_search { "size":2, "_source": false, "query": { "vector": { "my_vector": { "vector": [1, 1], "topk":2 } } } }
Parameter |
Description |
---|---|
vector (the first one) |
Indicates that the query type is VectorQuery. |
my_vector |
Indicates the name of the vector field you want to query. |
vector (the second one) |
Indicates the vector value you want to query, which can be an array or a Base64 string |
topk |
Same as the value of size generally. |
Indicates optional query parameters. You can adjust the vector index parameters to achieve higher query performance or precision. |
Type |
Parameter |
Description |
---|---|---|
Graph index configuration parameters |
ef |
Queue size of the neighboring node during the query. A larger value indicates a higher query precision and slower query speed. The default value is 200. Value range: (0, 100000] |
max_scan_num |
Maximum number of scanned nodes. A larger value indicates a higher query precision and slower query speed. The default value is 10000. Value range: (0, 1000000] |
|
IVF index configuration parameters |
nprobe |
Number of center points. A larger value indicates a higher query precision and slower query speed. The default value is 100. Value range: (0, 100000] |
Compound Query
Vector search can be used together with other Elasticsearch subqueries, such as Boolean query and post-filtering, for compound query.
In the following two examples, top 10 (topk) results closest to the query vector are queried first. filter retains only the results whose my_label field is red.
- Example of a Boolean query
POST my_index/_search { "size": 10, "query": { "bool": { "must": { "vector": { "my_vector": { "vector": [1, 2], "topk": 10 } } }, "filter": { "term": { "my_label": "red" } } } } }
- Example of post-filtering
GET my_index/_search { "size": 10, "query": { "vector": { "my_vector": { "vector": [1, 2], "topk": 10 } } }, "post_filter": { "term": { "my_label": "red" } } }
ScriptScore Query
You can use script_score to perform Nearest Neighbor Search (NSS) on vectors. The query syntax is provided below.
The pre-filtering condition can be any query. script_score traverses only the pre-filtered results, calculates the vector similarity, and sorts and returns the results. The performance of this query depends on the size of the intermediate result set after the pre-filtering. If the pre-filtering condition is set to match_all, brute-force search is performed on all data.
POST my_index/_search { "size":2, "query": { "script_score": { "query": { "match_all": {} }, "script": { "source": "vector_score", "lang": "vector", "params": { "field": "my_vector", "vector": [1.0, 2.0], "metric": "euclidean" } } } } }
Parameter |
Description |
---|---|
source |
Script description. Its value is vector_score if the vector similarity is used for scoring. |
lang |
Script syntax description. Its value is vector. |
field |
Vector field name |
vector |
Vector data to be queried |
metric |
Measurement method, which can be euclidean, inner_product, cosine, and hamming. Default value: euclidean |
Re-Score Query
If the GRAPH_PQ or IVF_GRAPH_PQ index is used, the query results are sorted based on the asymmetric distance calculated by PQ. CSS supports re-scoring and ranking of query results to improve the recall rate.
Assuming that my_index is a PQ index, an example of re-scoring the query results is as follows:
GET my_index/_search { "size": 10, "query": { "vector": { "my_vector": { "vector": [1.0, 2.0], "topk": 100 } } }, "rescore": { "window_size": 100, "vector_rescore": { "field": "my_vector", "vector": [1.0, 2.0], "metric": "euclidean" } } }
Parameter |
Description |
---|---|
window_size |
Vector retrieval returns topk search results and ranks the first window_size results. |
field |
Vector field name |
vector |
Vector data to be queried |
metric |
Measurement method, which can be euclidean, inner_product, cosine, and hamming. Default value: euclidean |
Painless Syntax Extension
CSS extension supports multiple vector distance calculation functions, which can be directly used in customized painless scripts to build flexible re-score formulas.
The following is an example:
POST my_index/_search { "size": 10, "query": { "script_score": { "query": { "match_all": {} }, "script": { "source": "1 / (1 + euclidean(params.vector, doc[params.field]))", "params": { "field": "my_vector", "vector": [1, 2] } } } } }
The following table lists the distance calculation functions supported by the CSS.
Function Signature |
Description |
---|---|
euclidean(Float[], DocValues) |
Euclidean distance function |
cosine(Float[], DocValues) |
Cosine similarity function |
innerproduct(Float[], DocValues) |
Inner product function |
hamming(String, DocValues) |
Hamming distance function Only vectors whose dim_type is binary are supported. The input query vector must be a Base64-encoded character string. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot