更新时间:2025-09-05 GMT+08:00
通过嵌套字段实现向量检索
使用嵌套字段可以实现在单条文档中存储多条向量数据,比如在RAG场景中,文档数据通常需要按段落或按长度进行切分,分别进行向量化得到多条语义向量,通过嵌套字段(Nested)可以将这些向量写入同一条ES的文档中。对于包含多条向量数据的文档,查询时任意一条向量数据与查询向量相似便会返回该条文档。
约束限制
仅OpenSearch 2.19.0版本的集群支持在嵌套字段中使用向量索引。
创建向量索引
创建一个带有嵌套字段的向量索引,该索引包含一个id字段,类型为keyword,包含一个embedding字段,类型为nested。embedding嵌套字段包含两个子字段chunk和emb,其中chunk为keyword类型,emb为vector类型。
PUT my_index { "settings": { "index.vector": true }, "mappings": { "properties": { "id": { "type": "keyword" }, "embedding": { "type": "nested", "properties": { "chunk": { "type": "keyword" }, "emb": { "type": "vector", "dimension": 2, "indexing": true, "algorithm": "GRAPH", "metric": "euclidean" } } } } } }
导入向量数据
使用Bulk操作,以数组形式写入数据,每条文档包含了2条向量数据。
POST my_index/_bulk {"index":{}} {"id": 1, "embedding": [{"chunk":1,"emb": [1, 1]}, {"chunk":2,"emb": [2, 2]}]} {"index":{}} {"id": 2, "embedding": [{"chunk":1,"emb": [2, 2]}, {"chunk":2,"emb": [3, 3]}]} {"index":{}} {"id": 3, "embedding": [{"chunk":1,"emb": [3, 3]}, {"chunk":2,"emb": [4, 4]}]}
向量检索
Nested字段需要使用nested查询,查询时需要指定path参数以指明要查询的嵌套路径,以及必须设置score_mode为max,表示文档的得分为该文档中所有向量与查询向量相似度的最大值。
- 标准查询
GET my_index/_search { "_source": {"excludes": ["embedding"]}, "query": { "nested": { "path": "embedding", "score_mode": "max", "query": { "vector": { "embedding.emb": { "vector": [1, 1], "topk": 10 } } } } } }
查询结果示例如下:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "my_index", "_type" : "_doc", "_id" : "Hc4Vc5QBSxCnghau22AE", "_score" : 1.0, "_source" : { "id" : 1 } }, { "_index" : "my_index", "_type" : "_doc", "_id" : "Hs4Vc5QBSxCnghau22AE", "_score" : 0.33333334, "_source" : { "id" : 2 } }, { "_index" : "my_index", "_type" : "_doc", "_id" : "H84Vc5QBSxCnghau22AE", "_score" : 0.11111111, "_source" : { "id" : 3 } } ] } }
- 前置过滤查询
先筛选出id取值为["2", "3"]的文档,再返回与查询向量[1, 1]最相似的Top10文档。
GET my_index/_search { "query": { "nested": { "path": "embedding", "score_mode": "max", "query": { "vector": { "embedding.emb": { "vector": [1, 1], "topk": 10, "filter": { "terms": {"id": ["2", "3"]} } } } } } } }
查询结果示例如下:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.33333334, "hits" : [ { "_index" : "my_index", "_type" : "_doc", "_id" : "3t0ZypcB-Tff59gMTZO2", "_score" : 0.33333334, "_source" : { "id" : 2, "embedding" : [ { "chunk" : 1, "emb" : [ 2, 2 ] }, { "chunk" : 2, "emb" : [ 3, 3 ] } ] } }, { "_index" : "my_index", "_type" : "_doc", "_id" : "390ZypcB-Tff59gMTZO2", "_score" : 0.11111111, "_source" : { "id" : 3, "embedding" : [ { "chunk" : 1, "emb" : [ 3, 3 ] }, { "chunk" : 2, "emb" : [ 4, 4 ] } ] } } ] } }
父主题: 配置OpenSearch集群向量检索