Performing Vector Search
- Standard Query: retrieves documents that are most similar to the query vector.
- Hybrid Query: combines vector search with traditional Elasticsearch queries, such as pre-filtering and Boolean queries.
- Script Score Query: enables custom similarity calculations for vector searches by executing a custom script
- Rescore Query: rescores and reranks the top results returned by an initial query to improve recall.
- Painless Syntax Extension: allows the use of vector distance or similarity calculation functions in custom scripts.
Standard Query
Standard query is used to retrieve documents that are most similar to the query vector.
The following command will return k (specified by size/topk) records that are the closest matches to the query vector.
POST my_index/_search
{
"size":2,
"_source": false,
"query": {
"vector": {
"my_vector": {
"vector": [1, 1],
"topk":2
}
}
}
}
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
size |
Yes |
Integer |
Number of search results to return. Default value: 10 |
_source |
No |
Boolean |
Whether to return the source text in documents. To reduce data transmission and improve query performance, set this parameter to false.
Value range:
|
query |
Yes |
Map |
Specifies the query vector. Parameters: vector (mandatory): indicates a vector query (vector similarity-based search), including the vector field and query vector value. my_vector (mandatory): queried vector field (for example, my_vector). |
vector (sub-parameter) |
Yes |
Array/String |
Query vector value. It is used to calculate the similarity between indexed vectors and the query vector. The value can be an array (for example, [1, 1]) or Base64-encoded value (for example, AAABAAACAAAD). |
topk |
Yes |
Integer |
Top-k most similar results to be returned. Default value: same as size. |
ef |
No |
Integer |
How many nearest neighbors to explore when inserting a new vector into the graph. A larger value indicates a higher query accuracy yet slower query speed. This parameter is available only when algorithm is set to GRAPH, GRAPH_PQ, GRAPH_SQ8, or GRAPH_SQ4. Value range: 0–100000 Default value: 200 |
max_scan_num |
No |
Integer |
Maximum number of graph nodes to scan during search. A larger value indicates a higher query accuracy yet slower query speed. This parameter is available only when algorithm is set to GRAPH, GRAPH_PQ, GRAPH_SQ8, or GRAPH_SQ4. Value range: 0–1000000 Default value: 10000 |
nprobe |
No |
Integer |
Number of centroids to explore during an IVF index query. A larger value indicates a higher query accuracy yet slower query speed. This parameter is available only when algorithm is IVF_GRAPH or IVF_GRAPH_PQ. Value range: 0–100000 Default value: 100 |
Hybrid Query
Hybrid query combines vector search with traditional Elasticsearch queries, such as pre-filtering and Boolean queries.

Only Elasticsearch 7.10.2 clusters support pre-filtering queries.
In the following example, the top 10 records whose my_label value is red are returned.
- Pre-filtering query
First, filters are applied to retrieve matching results. Then, vector search is performed on these results to retrieve the most relevant vectors based on similarity.
The following is an example:POST my_index/_search { "size": 10, "query": { "vector": { "my_vector": { "vector": [1, 2], "topk": 10, "filter": { "term": { "my_label": "red" } } } } } }
Table 2 Parameters for pre-filtering query Parameter
Mandatory
Type
Description
filter
Yes
Map
Vector query filters. Standard Elasticsearch query filters are supported, such as term and range.
If filter is too restrictive, leading to a small intermediate result set, you can set the index.vector.exact_search_threshold parameter, so that when the intermediate result set is smaller than this threshold, pre-filtering query automatically switches over to brute-force query (FLAT algorithm), which ensures a high recall rate. For more information, see Creating a Vector Index.
term
No
Map
Term query is a type of exact query. Documents that contain the exact term will be returned. For example, {"term": {"my_label": "red"}} means only to return documents whose my_label value is red.
- Boolean query
A Boolean query is in fact a post-filtering query method. Filtering and vector similarity-based search are performed separately. Then, the results of the two are combined using Boolean logic defined by clauses like must, should, and filter.
The following is an example:POST my_index/_search { "size": 10, "query": { "bool": { "must": { "vector": { "my_vector": { "vector": [1, 2], "topk": 10 } } }, "filter": { "term": { "my_label": "red" } } } } }
Table 3 Boolean query parameters Parameter
Mandatory
Type
Description
bool
Yes
Map
A compound query clause that combines subqueries using configured Boolean logic.
Parameter description:- must: Clauses that must match for documents to be included in the results.
- filter: It is similar to must, but do not contribute to the relevance score.
- should: Clauses that should match, but are not required. They are like nice-to-haves.
- must_not: Clauses that must not match for documents to be included in the results.
bool.must
Yes
Map
Clauses that must match for documents to be included in the results. Parameter description:- vector: query vector
- my_vector: vector field
- topk: number of results to return
bool.filter
Yes
Map
Clauses that must match, but do not contribute to the relevance score. Standard Elasticsearch query filters are supported, such as term and range.
Script Score Query
Script_score query enables custom similarity calculations for vector searches by executing a user-defined script. It works as follows:
Pre-filtering works with any query. script_score then calculates vector similarity on the pre-filtered results, and ranks the results. This query method does not use vector indexes. Its performance depends on the size of the intermediate result set after the pre-filtering. If the pre-filtering condition is set to match_all, a brute-force search is performed on all data.
The following is an example:
POST my_index/_search
{
"size":2,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "vector_score",
"lang": "vector",
"params": {
"field": "my_vector",
"vector": [1.0, 2.0],
"metric": "euclidean"
}
}
}
}
}
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
script_score |
Yes |
Map |
A root parameter for the script_score query.
Parameter description:
|
source |
Yes |
String |
Script name. The value is fixed to vector_score, indicating that a built-in script is used for calculating similarity. |
lang |
Yes |
String |
Script language type. The value is fixed to vector. |
field |
Yes |
String |
Queried vector field, for example, my_vector. |
vector |
Yes |
Array/String |
Query vector value. It is used to calculate the similarity between indexed vectors and the query vector. The value can be an array (for example, [1, 1]) or Base64-encoded value (for example, AAABAAACAAAD). |
metric |
Yes |
String |
Vector distance metric, which measures the similarity or distance between vectors. Value range:
|
Rescore Query
Rescore query rescores and reranks the top results returned by an initial query to improve recall.
When the GRAPH_PQ or IVF_GRAPH_PQ indexing algorithm is used, query results are ranked based on the asymmetric distance calculated by PQ. Rescore query then rescores and reranks the initial search results to improve recall.
The following is an example of rescore query on a PQ index named my_index:
GET my_index/_search
{
"size": 10,
"query": {
"vector": {
"my_vector": {
"vector": [1.0, 2.0],
"topk": 100
}
}
},
"rescore": {
"window_size": 100,
"vector_rescore": {
"field": "my_vector",
"vector": [1.0, 2.0],
"metric": "euclidean"
}
}
}
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
rescore |
Yes |
Map |
Defines rescoring parameters.
Key parameters:
|
window_size |
Yes |
Integer |
Rescoring/reranking window size. The vector search returns the top k results, but only the first window_size results are rescored and reranked. A larger value indicates a larger reranking scope and hence a higher recall rate, but it also leads to higher computational overhead. Default value: 100 |
field |
Yes |
String |
Queried vector field, for example, my_vector. |
vector |
Yes |
Array/String |
Query vector value. It is used to calculate the similarity between indexed vectors and the query vector. The value can be an array (for example, [1, 1]) or Base64-encoded value (for example, AAABAAACAAAD). |
metric |
Yes |
String |
Vector distance metric, which measures the similarity or distance between vectors. Value range:
|
Painless Syntax Extension
Painless syntax extension allows the use of vector distance or similarity calculation functions in custom scripts. CSS extension supports several vector distance/similarity calculation functions, which users can use readily in custom Painless scripts to build flexible rescoring formulas.
The following is an example:
POST my_index/_search
{
"size": 10,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "1 / (1 + euclidean(params.vector, doc[params.field]))",
"params": {
"field": "my_vector",
"vector": [1, 2]
}
}
}
}
}
Function Signature |
Description |
---|---|
euclidean(Float[], DocValues) |
Euclidean distance |
cosine(Float[], DocValues) |
Cosine similarity |
innerproduct(Float[], DocValues) |
Inner product |
hamming(String, DocValues) |
Hamming distance Only vectors whose dim_type is binary are supported. The input query vector must be a Base64-encoded character string. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot