Configuring Query Scoring in an Elasticsearch Cluster

You can score matched documents in an Elasticsearch cluster. This section describes how to configure query scoring.

Overview

You can score a query in either of the following ways:

Calculate the final scores (new_score) of query results based on vote and sort the results in descending order.
new_score = query_score x (vote x factor)
- query_score: calculated based on the total number of search keywords found in a record. A record earns 1 point for each keyword it contains.
- vote: vote of a record.
- factor : user-defined weight of vote.
Calculate the final scores (new_score) of query results based on inline and sort the results in descending order.
new_score = query_score x inline
- query_score: calculated based on the total number of search keywords found in a record. A record earns 1 point for each keyword it contains.
- vote: vote of a record.
- inline: Configure two value options for this parameter and a threshold for vote. One option is used if vote exceeds the threshold, and the other is used if vote is smaller than or equal to the threshold. In this way, the query accuracy will not be affected by abnormal vote values.

Prerequisites

An Elasticsearch cluster has been created on the CSS management console and is available.

Procedure

The code examples in this section can only be used for clusters Elasticsearch 7.x or later.

Log in to the CSS management console.
In the navigation pane on the left, click Clusters to go to the Elasticsearch cluster list.
Click Access Kibana in the Operation column of a cluster.
In the navigation tree on the left of Kibana, choose Dev Tools. The command execution page is displayed.

Create an index and specify a custom mapping to define the data type.

For example, the content of the tv.json file is as follows:

{
"tv":[
{ "name": "tv1", "description": "USB, DisplayPort", "vote": 0.98 }
{ "name": "tv2", "description": "USB, HDMI", "vote": 0.99 }
{ "name": "tv3", "description": "USB", "vote": 0.5 }
{ "name": "tv4", "description": "USB, HDMI, DisplayPort", "vote": 0.7 }
]
}

Run the following command to create the mall index and specify the user-defined mapping to define the data type:

PUT /mall?pretty
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "description": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "vote": {
        "type": "float"
      }
    }
  }
}

Import data.

Run the following command to import data in the tv.json file to the mall index:

POST /mall/_bulk?pretty
{ "index": {"_id": "1"}}
{ "name": "tv1", "description": "USB, DisplayPort", "vote": 0.98 }
{ "index": {"_id": "2"}}
{ "name": "tv2", "description": "USB, HDMI", "vote": 0.99 }
{ "index": {"_id": "3"}}
{ "name": "tv3", "description": "USB", "vote": 0.5 }
{ "index": {"_id": "4"}}
{ "name": "tv4", "description": "USB, HDMI, DisplayPort", "vote": 0.7 }

Query data by using custom scoring. The query results can be scored based on vote or inline.

Assume a user wants to query TVs with USB, HDMI, and/or DisplayPort ports. The final query score can be calculated in the following ways and used for sorting:

Scoring based on vote

The score is calculated using the formula new_score = query_score x (vote x factor). Run the following command:

GET /mall/_doc/_search?pretty
{
  "query":{
    "function_score":{
      "query":{
        "bool":{
          "should":[
            {"match": {"description": "USB"}},
            {"match": {"description": "HDMI"}},
            {"match": {"description": "DisplayPort"}}
          ]
        }
      },
      "field_value_factor":{
        "field":"vote",
        "factor":1
      },
      "boost_mode":"multiply",
      "max_boost":10
    }
  }
}

The query results are displayed in descending order of the score. The command output is as follows:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 0.8388366,
    "hits" : [
      {
        "_index" : "mall",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.8388366,
        "_source" : {
          "name" : "tv4",
          "description" : "USB, HDMI, DisplayPort",
          "vote" : 0.7
        }
      },
      {
        "_index" : "mall",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.7428025,
        "_source" : {
          "name" : "tv2",
          "description" : "USB, HDMI",
          "vote" : 0.99
        }
      },
      {
        "_index" : "mall",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.7352994,
        "_source" : {
          "name" : "tv1",
          "description" : "USB, DisplayPort",
          "vote" : 0.98
        }
      },
      {
        "_index" : "mall",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.03592815,
        "_source" : {
          "name" : "tv3",
          "description" : "USB",
          "vote" : 0.5
        }
      }
    ]
  }
}

Scoring based on inline

The score is calculated using the formula new_score = query_score x inline. In this example, if vote > 0.8, the value of inline is 1. If vote ≤ 0.8, the value of inline is 0.5. Run the following command:

GET /mall/_doc/_search?pretty
{
  "query":{
    "function_score":{
      "query":{
        "bool":{
          "should":[
            {"match":{"description":"USB"}},
            {"match":{"description":"HDMI"}},
            {"match":{"description":"DisplayPort"}}
          ]
        }
      },
      "script_score": {
        "script": {
          "params": {
            "threshold": 0.8
          },
          "inline": "if (doc[\"vote\"].value > params.threshold) {return 1;} return 0.5;"
        }
      },
      "boost_mode":"multiply",
      "max_boost":10
    }
  }
}

The query results are displayed in descending order of the score. The command output is as follows:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 0.75030553,
    "hits" : [
      {
        "_index" : "mall",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.75030553,
        "_source" : {
          "name" : "tv1",
          "description" : "USB, DisplayPort",
          "vote" : 0.98
        }
      },
      {
        "_index" : "mall",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.75030553,
        "_source" : {
          "name" : "tv2",
          "description" : "USB, HDMI",
          "vote" : 0.99
        }
      },
      {
        "_index" : "mall",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.599169,
        "_source" : {
          "name" : "tv4",
          "description" : "USB, HDMI, DisplayPort",
          "vote" : 0.7
        }
      },
      {
        "_index" : "mall",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.03592815,
        "_source" : {
          "name" : "tv3",
          "description" : "USB",
          "vote" : 0.5
        }
      }
    ]
  }
}

Parent topic: Practices

Previous topic: Using CSS to Build a Unified Log Management Platform

Next topic: FAQs