Example
Application Scenarios
Configure a custom word dictionary for the cluster, set main words, stop words, and synonyms. Search for the target text by keyword and synonym and view the search results.
Step 1: Configure a Custom Word Dictionary
- Prepare a word dictionary file (a text file encoded using UTF-8 without BOM) and upload it to the target OBS path.
Set the main word dictionary file, stop word dictionary file, and synonym word dictionary file.
The built-in static stop word dictionary contains common stop words such as are and the. If the built-in stop word dictionary was never deleted or updated, you do not need to upload such stop words.
- In the navigation pane on the left, choose Clusters.
- On the Clusters page, click the name of the target cluster.
- Click the Word Dictionaries tab. Configure the word dictionary file for the step 1 by referring to Managing Word Dictionaries.
- After the word dictionary takes effect, return to the cluster list. Locate the target cluster and click Kibana in the Operation column to access the cluster.
- On the Kibana page, click Dev Tools in the navigation tree on the left. The operation page is displayed.
- Run the following commands to check the performance of different word segmentation policies.
- Use the ik_smart word segmentation policy to split the target text.
After the operation is completed, view the word segmentation result.
{ "tokens": [ { "token": "word-1", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 }, { "token": "word-2", "start_offset": 5, "end_offset": 8, "type": "CN_WORD", "position": 1 } ] }
- Use the ik_max_word word segmentation policy to split the target text.
POST /_analyze { "analyzer":"ik_max_word", "text":"Text used for word segmentation" }
After the operation is completed, view the word segmentation result.
{ "tokens" : [ { "token": "word-1", "start_offset" : 0, "end_offset" : 4, "type" : "CN_WORD", "position" : 0 }, { "token" : "word-3", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 1 }, { "token" : "word-4", "start_offset" : 0, "end_offset" : 1, "type" : "CN_WORD", "position" : 2 }, { "token" : "word-5", "start_offset" : 1, "end_offset" : 3, "type" : "CN_WORD", "position" : 3 }, { "token" : "word-6", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 4 }, { "token" : "word-7", "start_offset" : 3, "end_offset" : 4, "type" : "CN_WORD", "position" : 5 }, { "token" : "word-2", "start_offset" : 5, "end_offset" : 8, "type" : "CN_WORD", "position" : 6 }, { "token" : "word-8", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 7 }, { "token" : "word-9", "start_offset" : 6, "end_offset" : 8, "type" : "CN_WORD", "position" : 8 }, { "token" : "word-10", "start_offset" : 7, "end_offset" : 8, "type" : "CN_WORD", "position" : 9 } ] }
- Use the ik_smart word segmentation policy to split the target text.
Step 2: Use Keywords for Search
The commands for versions earlier than Elasticsearch 7.x are different from those for versions later than Elasticsearch 7.x. Examples are as follows.
- Versions earlier than 7.x
- Create the book index and configure the word segmentation policy.
In this example, both analyzer and search_analyzer are set to ik_max_word. You can also use ik_smart.
PUT /book { "settings": { "number_of_shards": 2, "number_of_replicas": 1 }, "mappings": { "type1": { "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } } } }
- Import the text information to the book index.
PUT /book/type1/1 { "content":"Imported text" }
- Use a keyword to search for the text and view the search results.
GET /book/type1/_search { "query": { "match": { "content": "Keyword" } } }
Search result
{ "took" : 20, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.1507283, "hits" : [ { "_index" : "book", "_type" : "type1", "_id" : "1", "_score" : 1.1507283, "_source" : { "content" : "Imported text" } } ] } }
- Create the book index and configure the word segmentation policy.
- 7.x and later versions
- Create the book index and configure the word segmentation policy.
In this example, both analyzer and search_analyzer are set to ik_max_word. You can also use ik_smart.
PUT /book { "settings": { "number_of_shards": 2, "number_of_replicas": 1 }, "mappings": { "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } } }
- Import the text information to the book index.
PUT /book/_doc/1 { "content":"Imported text" }
- Use a keyword to search for the text and view the search results.
GET /book/_doc/_search { "query": { "match": { "content": "Keyword" } } }
Search result
{ "took" : 16, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.7260926, "hits" : [ { "_index" : "book", "_type" : "_doc", "_id" : "1", "_score" : 1.7260926, "_source" : { "content" : "Imported text" } } ] } }
- Create the book index and configure the word segmentation policy.
Step 3: Use Synonyms for Search
The commands for versions earlier than Elasticsearch 7.x are different from those for versions later than Elasticsearch 7.x. Examples are as follows.
- Versions earlier than 7.x
- Create the myindex index and configure the word segmentation policy.
PUT myindex { "settings": { "analysis": { "filter": { "my_synonym": { "type": "dynamic_synonym" } }, "analyzer": { "ik_synonym": { "filter": [ "my_synonym" ], "type": "custom", "tokenizer": "ik_smart" } } } }, "mappings": { "mytype" :{ "properties": { "desc": { "type": "text", "analyzer": "ik_synonym" } } } } }
- Import the text information to the myindex index.
PUT /myindex/mytype/1 { "desc": "Imported text" }
- Conduct search based on the synonym and view the search results.
GET /myindex/_search { "query": { "match": { "desc": "Keyword" } } }
Search result
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.49445358, "hits" : [ { "_index" : "myindex", "_type" : "mytype", "_id" : "1", "_score" : 0.49445358, "_source" : { "desc" : "Imported text" } } ] } }
- Create the myindex index and configure the word segmentation policy.
- 7.x and later versions
- Create the myindex index and configure the word segmentation policy.
PUT myindex { "settings": { "analysis": { "filter": { "my_synonym": { "type": "dynamic_synonym" } }, "analyzer": { "ik_synonym": { "filter": [ "my_synonym" ], "type": "custom", "tokenizer": "ik_smart" } } } }, "mappings": { "properties": { "desc": { "type": "text", "analyzer": "ik_synonym" } } } }
- Import the text information to the myindex index.
PUT /myindex/_doc/1 { "desc": "Imported text" }
- Conduct search based on the synonym and view the search results.
GET /myindex/_search { "query": { "match": { "desc": "Keyword" } } }
Search result
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.1519955, "hits" : [ { "_index" : "myindex", "_type" : "_doc", "_id" : "1", "_score" : 0.1519955, "_source" : { "desc" : "Imported text" } } ] } }
- Create the myindex index and configure the word segmentation policy.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot