Ranking Search Results Using Elasticsearch Custom Rules
You can use the Elasticsearch cluster to sort the search results based on customized rules.
Scenario
- E-commerce: Sort offerings based on factors such as sales volume, user comments, and prices.
- Content Management: Sort articles or blog entries based on the number of views and publishing time.
- Financial Services: Sort transaction records based on transaction amount, frequency, or risk score.
- Customer Support: Sort customer requests based on the urgency or opening time of service tickets.
Solution Architecture
The sorting API in Elasticsearch is used to sort search results according to customized rules. By calling the sorting API, you can query and sort data based on customized rules.
You can query with customized rules using either of the following methods:
- Calculate the final scores (new_score) of query results based on vote and sort the results in descending order.
new_score = query_score x (vote x factor)
- query_score: calculated based on the total number of search keywords found in a record. A record earns 1 point for each keyword it contains.
- vote: vote of a record.
- factor: user-defined weight of vote.
- Calculate the final scores (new_score) of query results based on inline and sort the results in descending order.
new_score = query_score x inline
- query_score: calculated based on the total number of search keywords found in a record. A record earns 1 point for each keyword it contains.
- vote: vote of a record.
- inline: Configure two value options for this parameter and a threshold for vote. One option is used if vote exceeds the threshold, and the other is used if vote is smaller than or equal to the threshold. In this way, the query accuracy will not be affected by abnormal vote values.
Advantages
- Flexibility: Customized sorting rules can meet various complex service requirements.
- Scalability: The distributed nature of Elasticsearch supports horizontal expansion to accommodate increasing data volumes.
- Performance: Elasticsearch's optimization mechanisms ensure efficient sorting operations, maintaining good performance even with large-scale datasets.
- Real-time: The near real-time search capability of Elasticsearch ensures the timeliness of sorting results.
Prerequisites
An Elasticsearch cluster is available.
Procedure
The code examples in this section can only be used for clusters Elasticsearch 7.x or later.
- Log in to the CSS management console.
- In the navigation pane on the left, click Clusters to go to the Elasticsearch cluster list.
- Click Access Kibana in the Operation column of a cluster.
- In the navigation tree on the left of Kibana, choose Dev Tools. The command execution page is displayed.
- Create an index and specify a custom mapping to define the data type.
For example, the content of the tv.json file is as follows:
{ "tv":[ { "name": "tv1", "description": "USB, DisplayPort", "vote": 0.98 } { "name": "tv2", "description": "USB, HDMI", "vote": 0.99 } { "name": "tv3", "description": "USB", "vote": 0.5 } { "name": "tv4", "description": "USB, HDMI, DisplayPort", "vote": 0.7 } ] }
Run the following command to create the mall index and specify the user-defined mapping to define the data type:
PUT /mall?pretty { "mappings": { "properties": { "name": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "description": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "vote": { "type": "float" } } } }
- Import data.
Run the following command to import data in the tv.json file to the mall index:
POST /mall/_bulk?pretty { "index": {"_id": "1"}} { "name": "tv1", "description": "USB, DisplayPort", "vote": 0.98 } { "index": {"_id": "2"}} { "name": "tv2", "description": "USB, HDMI", "vote": 0.99 } { "index": {"_id": "3"}} { "name": "tv3", "description": "USB", "vote": 0.5 } { "index": {"_id": "4"}} { "name": "tv4", "description": "USB, HDMI, DisplayPort", "vote": 0.7 }
- Query data based on customized rules. The query results can be scored based on vote or inline.
Assume a user wants to query TVs with USB, HDMI, and/or DisplayPort ports. The final query score can be calculated in the following ways and used for sorting:
- Scoring based on vote
The score is calculated using the formula new_score = query_score x (vote x factor). Run the following command:
GET /mall/_doc/_search?pretty { "query":{ "function_score":{ "query":{ "bool":{ "should":[ {"match": {"description": "USB"}}, {"match": {"description": "HDMI"}}, {"match": {"description": "DisplayPort"}} ] } }, "field_value_factor":{ "field":"vote", "factor":1 }, "boost_mode":"multiply", "max_boost":10 } } }
The query results are displayed in descending order of the score. The command output is as follows:{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 0.8388366, "hits" : [ { "_index" : "mall", "_type" : "_doc", "_id" : "4", "_score" : 0.8388366, "_source" : { "name" : "tv4", "description" : "USB, HDMI, DisplayPort", "vote" : 0.7 } }, { "_index" : "mall", "_type" : "_doc", "_id" : "2", "_score" : 0.7428025, "_source" : { "name" : "tv2", "description" : "USB, HDMI", "vote" : 0.99 } }, { "_index" : "mall", "_type" : "_doc", "_id" : "1", "_score" : 0.7352994, "_source" : { "name" : "tv1", "description" : "USB, DisplayPort", "vote" : 0.98 } }, { "_index" : "mall", "_type" : "_doc", "_id" : "3", "_score" : 0.03592815, "_source" : { "name" : "tv3", "description" : "USB", "vote" : 0.5 } } ] } }
- Scoring based on inline.
The score is calculated using the formula new_score = query_score x inline. In this example, if vote > 0.8, the value of inline is 1. If vote ≤ 0.8, the value of inline is 0.5. Run the following command:
GET /mall/_doc/_search?pretty { "query":{ "function_score":{ "query":{ "bool":{ "should":[ {"match":{"description":"USB"}}, {"match":{"description":"HDMI"}}, {"match":{"description":"DisplayPort"}} ] } }, "script_score": { "script": { "params": { "threshold": 0.8 }, "inline": "if (doc[\"vote\"].value > params.threshold) {return 1;} return 0.5;" } }, "boost_mode":"multiply", "max_boost":10 } } }
The query results are displayed in descending order of the score. The command output is as follows:
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 0.75030553, "hits" : [ { "_index" : "mall", "_type" : "_doc", "_id" : "1", "_score" : 0.75030553, "_source" : { "name" : "tv1", "description" : "USB, DisplayPort", "vote" : 0.98 } }, { "_index" : "mall", "_type" : "_doc", "_id" : "2", "_score" : 0.75030553, "_source" : { "name" : "tv2", "description" : "USB, HDMI", "vote" : 0.99 } }, { "_index" : "mall", "_type" : "_doc", "_id" : "4", "_score" : 0.599169, "_source" : { "name" : "tv4", "description" : "USB, HDMI, DisplayPort", "vote" : 0.7 } }, { "_index" : "mall", "_type" : "_doc", "_id" : "3", "_score" : 0.03592815, "_source" : { "name" : "tv3", "description" : "USB", "vote" : 0.5 } } ] } }
- Scoring based on vote
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot