Performing Vector Search
- Standard Query: retrieves documents that are most similar to the query vector.
- Hybrid Query: combines vector search with traditional OpenSearch queries, such as pre-filtering and Boolean queries.
- Script Score Query: enables custom similarity calculations for vector searches by executing a custom script.
- Rescore Query: rescores and reranks the top results returned by an initial query to improve recall.
- Painless Syntax Extension: allows the use of vector distance or similarity calculation functions in custom scripts.
Standard Query
Standard query is used to retrieve documents that are most similar to the query vector.
The following command will return k (specified by size/topk) records that are the closest matches to the query vector.
POST my_index/_search
{
"size":2,
"_source": false,
"query": {
"vector": {
"my_vector": {
"vector": [1, 1],
"topk":2
}
}
}
}
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
size |
Yes |
Integer |
Number of search results to return. Default value: 10 |
_source |
No |
Boolean |
Whether to return the source text in documents. To reduce data transmission and improve query performance, set this parameter to false.
Value range:
|
query |
Yes |
Map |
Specifies the query vector. Parameters: vector (mandatory): indicates a vector query (vector similarity-based search), including the vector field and query vector value. my_vector (mandatory): queried vector field (for example, my_vector). |
vector (sub-parameter) |
Yes |
Array/String |
Query vector value. It is used to calculate the similarity between indexed vectors and the query vector. The value can be an array (for example, [1, 1]) or Base64-encoded value (for example, AAABAAACAAAD). |
topk |
Yes |
Integer |
Top-k most similar results to be returned. Default value: same as size. |
ef |
No |
Integer |
How many nearest neighbors to explore when inserting a new vector into the graph. A larger value indicates a higher query accuracy yet slower query speed. This parameter is available only when algorithm is GRAPH or GRAPH_PQ. Value range: 0–100000 Default value: 200 |
max_scan_num |
No |
Integer |
Maximum number of graph nodes to scan during search. A larger value indicates a higher query accuracy yet slower query speed. This parameter is available only when algorithm is GRAPH or GRAPH_PQ. Value range: 0–1000000 Default value: 10000 |
nprobe |
No |
Integer |
Number of centroids to explore during an IVF index query. A larger value indicates a higher query accuracy yet slower query speed. This parameter is available only when algorithm is IVF_GRAPH or IVF_GRAPH_PQ. Value range: 0–100000 Default value: 100 |
Hybrid Query
Hybrid query combines vector search with traditional OpenSearch queries, such as Boolean query.
In the following example, the top 10 records whose my_label value is red are returned.
A Boolean query is in fact a post-filtering query method. Filtering and vector similarity-based search are performed separately. Then, the results of the two are combined using Boolean logic defined by clauses like must, should, and filter.
POST my_index/_search
{
"size": 10,
"query": {
"bool": {
"must": {
"vector": {
"my_vector": {
"vector": [1, 2],
"topk": 10
}
}
},
"filter": {
"term": { "my_label": "red" }
}
}
}
}
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
bool |
Yes |
Map |
A compound query clause that combines subqueries using configured Boolean logic.
Parameter description:
|
bool.must |
Yes |
Map |
Clauses that must match for documents to be included in the results. Parameter description:
|
bool.filter |
Yes |
Map |
Clauses that must match, but do not contribute to the relevance score. Standard OpenSearch query filters are supported, such as term and range. |
Script Score Query
Script_score query enables custom similarity calculations for vector searches by executing a user-defined script. It works as follows:
Pre-filtering works with any query. script_score then calculates vector similarity on the pre-filtered results, and ranks the results. This query method does not use vector indexes. Its performance depends on the size of the intermediate result set after the pre-filtering. If the pre-filtering condition is set to match_all, a brute-force search is performed on all data.
The following is an example:
POST my_index/_search
{
"size":2,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "vector_score",
"lang": "vector",
"params": {
"field": "my_vector",
"vector": [1.0, 2.0],
"metric": "euclidean"
}
}
}
}
}
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
script_score |
Yes |
Map |
A root parameter for the script_score query.
Parameter description:
|
source |
Yes |
String |
Script name. The value is fixed to vector_score, indicating that a built-in script is used for calculating similarity. |
lang |
Yes |
String |
Script language type. The value is fixed to vector. |
field |
Yes |
String |
Queried vector field, for example, my_vector. |
vector |
Yes |
Array/String |
Query vector value. It is used to calculate the similarity between indexed vectors and the query vector. The value can be an array (for example, [1, 1]) or Base64-encoded value (for example, AAABAAACAAAD). |
metric |
Yes |
String |
Vector distance metric, which measures the similarity or distance between vectors. Value range:
|
Rescore Query
Rescore query rescores and reranks the top results returned by an initial query to improve recall.
When the GRAPH_PQ or IVF_GRAPH_PQ indexing algorithm is used, query results are ranked based on the asymmetric distance calculated by PQ. Rescore query then rescores and reranks the initial search results to improve recall.
The following is an example of rescore query on a PQ index named my_index:
GET my_index/_search
{
"size": 10,
"query": {
"vector": {
"my_vector": {
"vector": [1.0, 2.0],
"topk": 100
}
}
},
"rescore": {
"window_size": 100,
"vector_rescore": {
"field": "my_vector",
"vector": [1.0, 2.0],
"metric": "euclidean"
}
}
}
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
rescore |
Yes |
Map |
Defines rescoring parameters.
Key parameters:
|
window_size |
Yes |
Integer |
Rescoring/reranking window size. The vector search returns the top k results, but only the first window_size results are rescored and reranked. A larger value indicates a larger reranking scope and hence a higher recall rate, but it also leads to higher computational overhead. Default value: 100 |
field |
Yes |
String |
Queried vector field, for example, my_vector. |
vector |
Yes |
Array/String |
Query vector value. It is used to calculate the similarity between indexed vectors and the query vector. The value can be an array (for example, [1, 1]) or Base64-encoded value (for example, AAABAAACAAAD). |
metric |
Yes |
String |
Vector distance metric, which measures the similarity or distance between vectors. Value range:
|
Painless Syntax Extension
Painless syntax extension allows the use of vector distance or similarity calculation functions in custom scripts. CSS extension supports several vector distance/similarity calculation functions, which users can use readily in custom Painless scripts to build flexible rescoring formulas.
The following is an example:
POST my_index/_search
{
"size": 10,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "1 / (1 + euclidean(params.vector, doc[params.field]))",
"params": {
"field": "my_vector",
"vector": [1, 2]
}
}
}
}
}
Function Signature |
Description |
---|---|
euclidean(Float[], DocValues) |
Euclidean distance |
cosine(Float[], DocValues) |
Cosine similarity |
innerproduct(Float[], DocValues) |
Inner product |
hamming(String, DocValues) |
Hamming distance Only vectors whose dim_type is binary are supported. The input query vector must be a Base64-encoded character string. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot