Updated on 2024-11-29 GMT+08:00

Querying a Vector

  • Basic query

    Basic query provides special vector query syntax for vector fields for which vector indexes are created. In the following example code, the first vector indicates that the query type is VectorQuery, and my_vector specifies the name of the vector field to be queried. The second vector specifies the vector value to be queried, which can be an array or Base64 strings. Generally, the value of topk is the same as that of size. Finally, the query will return n (specified by size/topk and 2 in the following example) pieces of data records that are closet to the query vector.

    POST my_index/_search 
     { 
       "size": 2, 
       "query": { 
         "vector": { 
           "my_vector": { 
             "vector": [1.0, 2.0], 
             "topk": 2 
           } 
         } 
       } 
     } 
  • Compound query

    Vector search can be used together with other Elasticsearch subqueries, such as Boolean query and post-filtering, for compound query.

    • Example of Boolean query
      POST my_index/_search 
       { 
         "size": 10, 
         "query": { 
           "bool": { 
             "must": { 
               "vector": { 
                 "my_vector": { 
                   "vector": [1.0, 2.0], 
                   "topk": 10 
                 } 
               } 
             }, 
             "filter": { 
               "term" : { "tags" : "production" } 
             }, 
             "must_not" : { 
               "range" : { "age" : {"gte" : 10, "lte" : 20} } 
             } 
           } 
         } 
       } 

      In this example, topk (10) results closest to the query vector are queried first. filter specifies the condition for filtering only the results whose tags field is production. The modifier of the range clause is must_not, indicating that the results obtained by the range query are deleted so as to obtain the final result. However, the number of final data records may be less than the value specified by topk.

    • Example of post-filtering
      GET my_index/_search 
       { 
         "size": 10, 
         "query": { 
           "vector": { 
             "my_vector": { 
               "vector": [1.0, 2.0], 
               "topk": 10 
             } 
           } 
         }, 
         "post_filter": { 
           "term": { "tags": "production" } 
         }
       }
  • Query scoring

    When GRAPH_PQ is used, the query result is sorted based on the asymmetric distance calculated by PQ. CSS supports re-scoring and sorting of query results to improve the recall rate. Assuming that my_index is a PQ index, an example of rescoring the query results is as follows:

    GET my_index/_search 
     { 
       "size": 10, 
       "query": { 
         "vector": { 
           "my_vector": { 
             "vector": [1.0, 2.0], 
             "topk": 100 
           } 
         } 
       }, 
       "rescore": { 
         "window_size": 100, 
         "vector_rescore": { 
           "field": "my_vector", 
           "vector": [1.0, 2.0], 
           "metric": "euclidean" 
         } 
       } 
     }
    Table 1 Rescore parameter description

    Parameter

    Remarks

    window_size

    topk results are returned by the vector query and only the first window_size results are sorted.

    field

    Name of a vector.

    vector

    Vectors to be queried.

    metric

    Metric of measuring the distance between vectors. The default value is euclidean. Options:

    • euclidean
    • inner_product
    • cosine
    • hamming