语义查询

语义查询会通过搜索大模型插件将用户输入的文本转换为向量表示，并基于向量相似度检索相关文档，提升搜索结果的语义相关性。本文介绍如何使用搜索大模型插件实现语义查询，包括multi_match语义查询、ext扩展查询和merge双路召回查询。

前提条件

已准备好向量索引，并且将模型服务关联到了向量索引中，且索引配置中的“index.vector”为“true”，操作指导请参见将模型服务关联到向量索引。
集群中已导入待查询的数据。

约束限制

仅支持multi_match查询，其他查询类型不会触发语义搜索。
multi_match查询字段“fields”必须与索引配置“index.inference.field”完全匹配（权重可忽略），否则会退化为文本查询。
当使用嵌套的multi_match查询时，该查询会将单个搜索请求拆解为多个子查询，由于精排模型的架构限制，系统无法将原始查询传递给排序模型进行特征计算。
当索引配置开启精排模型服务时，排序规则失效（即sort字段不生效），精排后得分范围为0~1，且每页独立计算，可能导致多页分数非严格递减。
当模型服务调用异常时，语义搜索会自动降级为文本搜索。

multi_match语义查询

multi_match语义查询除了用于多字段的全文搜索，还结合了查询改写和向量搜索。

配置举例：当multi_match的“fields”与索引配置“index.inference.field”匹配时，自动触发向量查询。

修改索引配置：

PUT pangu_index/_settings
{
  "index.inference.semantic_search_enabled": true,
  "index.inference.field": ["title:100", "content:30", "desc:80"],
  "index.inference.embedding_model": "pangu_vector",
  "index.inference.reorder_enabled": true,
  "index.inference.reorder_model": "pangu_ranking"
}

语义查询：

GET pangu_index/_search
{
 "query": {
  "multi_match" : {
   "query" : "北京",
   "fields" : [ "title", "desc", "content" ]
  }
 }
}

ext扩展查询

通过ext参数控制查询行为，兼容一个索引既可以进行原始查询也可以进行向量查询场景，实现A/B测试。

配置举例：当multi_match的“fields”与索引配置“index.inference.field”匹配时，自动触发向量查询，且进行查询改写。

索引配置：

PUT pangu_index/_settings
{
  "index.inference.semantic_search_enabled": true,
  "index.inference.field": ["title:100", "content:30", "desc:80"],
  "index.inference.embedding_model": "pangu_vector",
  "index.inference.reorder_enabled": true,
  "index.inference.reorder_model": "pangu_ranking"
}

语义查询：

GET pangu_index/_search
{
  "ext": {
    "inference": {
      "rewrite_enable": true,
      "vector_enable": true
    }
  },
  "query": {
    "multi_match": {
      "query": "年轻人上兴趣班",
      "fields": ["title", "content", "desc"]
    }
  }
}

表1 ext参数说明
参数	是否必选	参数类型	描述
rewrite_enable	否	Boolean	使用查询改写。仅当“vector_enable”为“true”时才生效。取值范围： true：开启查询改写。 false：关闭查询改写。默认值：false
vector_enable	否	Boolean	使用向量查询。取值范围： true：启用向量查询，multi_match查询为语义查询。 false：关闭向量查询，multi_match查询为文本查询。默认值：false
rerank_enable	否	Boolean	使用精排模型服务。取值范围： true：启用精排模型服务，对查询结果进行重新排序。 false：关闭精排模型服务。默认值：false
llm_enable	否	Boolean	使用大语言模型。取值范围： true：启用大语言模型，对查询结果进行LLM后处理。 false：关闭大语言模型，直接返回原始检索结果。默认值：false
search_type	否	String	搜索类型。取值范围： vector：向量搜索，只使用语义查询。该搜索类型完全依赖向量相似度检索、忽略文本匹配的场景。 merge：双路召回，同时执行文本匹配和向量搜索，合并结果后通过精排模型重新排序。该搜索类型可以提升召回率，避免单一检索方式的局限性。选择merge时，索引配置必须启用精排模型服务（即“index.inference.reorder_enabled”配置为“true”），否则返回向量查询结果。默认值：false

merge双路召回查询

合并文本和向量检索结果，提升召回率。

配置举例：当multi_match的“fields”与索引配置“index.inference.field”匹配时，自动触发双路召回。

修改索引配置：“index.inference.semantic_search_type”配置为“merge”，建议向量查询的结果数配置为文本查询的结果数的4倍。

PUT pangu_index/_settings
{
  "index.inference.semantic_search_enabled": true,
  "index.inference.field": ["title:100", "content:30", "desc:80"],
  "index.inference.embedding_model": "pangu_vector",
  "index.inference.reorder_enabled": true,
  "index.inference.reorder_model": "pangu_ranking",
  "index.inference.semantic_search_type": "merge",
  "index.inference.reorder_vector_topn": 40,
  "index.inference.reorder_text_topn": 10   
}

语义查询：

GET pangu_index/_search
{
 "query": {
  "multi_match" : {
   "query" : "北京",
   "fields" : [ "title", "desc", "content" ]
  }
 }
}

查询结果说明

语义查询的返回结果包含查询耗时（timestamp参数）和查询状态（status参数）信息。

返回结果举例：

{
  "timestamp" : {
    "convert_request_in_millis" : 47,
    "search_in_millis" : 0,
    "vector_search_in_millis" : 6,
    "rerank_in_millis" : 111,
    "total_search_in_millis" : 164
  },
  "status" : {
    "search_failed" : false,
    "rewrite" : true,
    "retrieval" : true,
    "reranking" : true,
    "vector_model": "auto"
  },
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
  ...
  }
}

表2 timestamp参数说明
参数	参数类型	描述
convert_request_in_millis	Long	查询改写阶段耗时。单位：毫秒（ms）
search_in_millis	Long	原始文本查询耗时。单位：毫秒（ms）
vector_search_in_millis	Long	向量搜索耗时。单位：毫秒（ms）
rerank_in_millis	Long	精排模型处理耗时。单位：毫秒（ms）
total_search_in_millis	Long	总查询耗时。单位：毫秒（ms）

表3 status参数说明
参数	参数类型	描述
search_failed	Boolean	查询是否失败。
rewrite	Boolean	是否执行查询改写。
retrieval	Boolean	是否执行语义向量检索。
reranking	Boolean	是否执行精排。
vector_model	String	向量查询模式。取值范围： auto：向量查询模式由向量数据库决定。 script：粗排先使用文本查询，对文本查询的结果进行向量查询排序。 bool：粗排使用文本查询和向量分别查询，再将查询结果合并。