Searching for Data Using a Vector Index
Standard Query
Standard vector query syntax is provided for vector fields with vector indexes. The following command will return n (specified by size/topk) data records that are most close to the query vector.
POST my_index/_search
{
"size":2,
"_source": false,
"query": {
"vector": {
"my_vector": {
"vector": [1, 1],
"topk":2
}
}
}
}
|
Parameter |
Description |
|---|---|
|
vector (the first one) |
Indicates that the query type is VectorQuery. |
|
my_vector |
Indicates the name of the vector field you want to query. |
|
vector (the second one) |
Indicates the vector value you want to query, which can be an array or a Base64 string |
|
topk |
Same as the value of size generally. |
|
Other optional parameters |
Indicates optional query parameters. You can adjust the vector index parameters to achieve higher query performance or precision. For more information, see Table 2. |
|
Type |
Parameter |
Description |
|---|---|---|
|
Graph index configuration parameters |
ef |
Queue size of the neighboring node during the query. A larger value indicates a higher query precision and slower query speed. The default value is 200. Value range: (0, 100000] |
|
max_scan_num |
Maximum number of scanned nodes. A larger value indicates a higher query precision and slower query speed. The default value is 10000. Value range: (0, 1000000] |
|
|
IVF index configuration parameters |
nprobe |
Number of center points. A larger value indicates a higher query precision and slower query speed. The default value is 100. Value range: (0, 100000] |
Compound Query
Vector search can be used together with other Elasticsearch subqueries, such as Boolean query and post-filtering, for compound query.
In the following two examples, top 10 (topk) results closest to the query vector are queried first. filter retains only the results whose my_label field is red.
- Example of a Boolean query
POST my_index/_search { "size": 10, "query": { "bool": { "must": { "vector": { "my_vector": { "vector": [1, 2], "topk": 10 } } }, "filter": { "term": { "my_label": "red" } } } } } - Example of post-filtering
GET my_index/_search { "size": 10, "query": { "vector": { "my_vector": { "vector": [1, 2], "topk": 10 } } }, "post_filter": { "term": { "my_label": "red" } } }
ScriptScore Query
You can use script_score to perform Nearest Neighbor Search (NSS) on vectors. The query syntax is provided below.
The pre-filtering condition can be any query. script_score traverses only the pre-filtered results, calculates the vector similarity, and sorts and returns the results. The performance of this query depends on the size of the intermediate result set after the pre-filtering. If the pre-filtering condition is set to match_all, brute-force search is performed on all data.
POST my_index/_search
{
"size":2,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "vector_score",
"lang": "vector",
"params": {
"field": "my_vector",
"vector": [1.0, 2.0],
"metric": "euclidean"
}
}
}
}
}
|
Parameter |
Description |
|---|---|
|
source |
Script description. Its value is vector_score if the vector similarity is used for scoring. |
|
lang |
Script syntax description. Its value is vector. |
|
field |
Vector field name |
|
vector |
Vector data to be queried |
|
metric |
Measurement method, which can be euclidean, inner_product, cosine, and hamming. Default value: euclidean |
Re-Score Query
If the GRAPH_PQ or IVF_GRAPH_PQ index is used, the query results are sorted based on the asymmetric distance calculated by PQ. CSS supports re-scoring and ranking of query results to improve the recall rate.
Assuming that my_index is a PQ index, an example of re-scoring the query results is as follows:
GET my_index/_search
{
"size": 10,
"query": {
"vector": {
"my_vector": {
"vector": [1.0, 2.0],
"topk": 100
}
}
},
"rescore": {
"window_size": 100,
"vector_rescore": {
"field": "my_vector",
"vector": [1.0, 2.0],
"metric": "euclidean"
}
}
}
|
Parameter |
Description |
|---|---|
|
window_size |
Vector search returns topk search results and ranks the first window_size results. |
|
field |
Vector field name |
|
vector |
Vector data to be queried |
|
metric |
Measurement method, which can be euclidean, inner_product, cosine, and hamming. Default value: euclidean |
Painless Syntax Extension
CSS extension supports multiple vector distance calculation functions, which can be directly used in customized painless scripts to build flexible re-score formulas.
The following is an example:
POST my_index/_search
{
"size": 10,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "1 / (1 + euclidean(params.vector, doc[params.field]))",
"params": {
"field": "my_vector",
"vector": [1, 2]
}
}
}
}
}
|
Function Signature |
Description |
|---|---|
|
euclidean(Float[], DocValues) |
Euclidean distance function |
|
cosine(Float[], DocValues) |
Cosine similarity function |
|
innerproduct(Float[], DocValues) |
Inner product function |
|
hamming(String, DocValues) |
Hamming distance function Only vectors whose dim_type is binary are supported. The input query vector must be a Base64-encoded character string. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.