Example
Analyzers
Elasticsearch provides the following two analyzers for using the word dictionary:
- ik_max_word: segments the text at a fine-grained level.
- ik_smart: segments the text at a coarse-grained level.
Example
- Log in to the CSS management console. Switch to the Clusters page. Click the name of the target cluster to switch to the Basic Information page.
- Prepare the main word dictionary file, stop word dictionary file, and synonym dictionary file. Upload the files encoded using UTF-8 without BOM to the corresponding OBS bucket, for example, obs-b8ed.
The default word dictionary contains common stop words. Therefore, you do not need to upload the stop words mentioned in the preceding example.
- Select the corresponding OBS path by referring to Configuring a Custom Word Dictionary and select corresponding main word dictionary file, stop word dictionary file, and synonym dictionary file. Click Save.
- After the word dictionary status changes to Succeeded, switch to the Clusters page. In the cluster list, locate the row where the target cluster resides and click Kibana in the Operation column.
- On the displayed page, click Dev Tools. On the displayed page, enter the following code and click
. You can view the word segmentation result on the right pane.
- Use the ik_smart analyzer to perform word segmentation on Text used for word segmentation.
Example code:
POST /_analyze { "analyzer":"ik_smart", "text": "Text used for word segmentation" }After the operation is completed, view the word segmentation result.
{ "tokens": [ { "token": "The word segmentation result", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 }, { "token": "The word segmentation result", "start_offset": 5, "end_offset": 8, "type": "CN_WORD", "position": 1 } ] } - Use the ik_max_word analyzer to perform word segmentation on Text used for word segmentation.
Example code:
POST /_analyze { "analyzer":"ik_max_word", "text":"Text used for word segmentation" }After the operation is completed, view the word segmentation result.
{ "tokens" : [ { Smartphones "start_offset" : 0, "end_offset" : 4, "type" : "CN_WORD", "position" : 0 }, { "token" : "The word segmentation result", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 1 }, { "token" : "The word segmentation result", "start_offset" : 0, "end_offset" : 1, "type" : "CN_WORD", "position" : 2 }, { "token" : "The word segmentation result", "start_offset" : 1, "end_offset" : 3, "type" : "CN_WORD", "position" : 3 }, { "token" : "The word segmentation result", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 4 }, { "token" : "The word segmentation result", "start_offset" : 3, "end_offset" : 4, "type" : "CN_WORD", "position" : 5 }, { "token" : "The word segmentation result", "start_offset" : 5, "end_offset" : 8, "type" : "CN_WORD", "position" : 6 }, { "token" : "The word segmentation result", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 7 }, { "token" : "The word segmentation result", "start_offset" : 6, "end_offset" : 8, "type" : "CN_WORD", "position" : 8 }, { "token" : "The word segmentation result", "start_offset" : 7, "end_offset" : 8, "type" : "CN_WORD", "position" : 9 } ] }
- Use the ik_smart analyzer to perform word segmentation on Text used for word segmentation.
- Refer to the following procedure to perform related operations, including creating an index, importing data, conducting search based on the keyword, and viewing the search result.
- Create an index named book. In this example, set both analyzer and search_analyzer to ik_max_word. You can also select ik_smart.
(Versions earlier than 7.x)
PUT /book { "settings": { "number_of_shards": 2, "number_of_replicas": 1 }, "mappings": { "type1": { "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } } } }(Version 7.X and later versions)
PUT /book { "settings": { "number_of_shards": 2, "number_of_replicas": 1 }, "mappings": { "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } } } - Import data. Import the text information to the book index.
(Versions earlier than 7.x)
PUT /book/type1/1 { "content":"Imported text" }(Version 7.X and later versions)
PUT /book/_doc/1 { "content":"Imported text" } - Conduct search based on the keywords.
(Versions earlier than 7.x)
GET /book/type1/_search { "query": { "match": { "content": "Keyword" } } }(Version 7.X and later versions)
GET /book/_doc/_search { "query": { "match": { "content": "Keyword" } } }Search result
(Versions earlier than 7.x)
{ "took" : 12, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.7260926, "hits" : [ { "_index" : "book", "_type" : "type1", "_id" : "1", "_score" : 1.7260926, "_source" : { "content" : "Imported text" } } ] } }(Version 7.X and later versions){ "took" : 16, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.7260926, "hits" : [ { "_index" : "book", "_type" : "_doc", "_id" : "1", "_score" : 1.7260926, "_source" : { "content" : "Imported text" } } ] } }
- Create an index named book. In this example, set both analyzer and search_analyzer to ik_max_word. You can also select ik_smart.
- Refer to the following procedure to perform related operations, including creating an index, importing data, conducting search based on the synonym, and viewing the search result.
- Create an index.
(Versions earlier than 7.x)
PUT myindex { "settings": { "analysis": { "filter": { "my_synonym": { "type": "dynamic_synonym" } }, "analyzer": { "ik_synonym": { "filter": [ "my_synonym" ], "type": "custom", "tokenizer": "ik_smart" } } } }, "mappings": { "mytype" :{ "properties": { "desc": { "type": "text", "analyzer": "ik_synonym" } } } } }(Version 7.x and earlier versions)
PUT myindex { "settings": { "analysis": { "filter": { "my_synonym": { "type": "dynamic_synonym" } }, "analyzer": { "ik_synonym": { "filter": [ "my_synonym" ], "type": "custom", "tokenizer": "ik_smart" } } } }, "mappings": { "properties": { "desc": { "type": "text", "analyzer": "ik_synonym" } } } } - Import data. Import the text information to the myindex index.
(Versions earlier than 7.x)
PUT /myindex/mytype/1 { "desc": "Imported text" }(Version 7.X and later versions)
PUT /myindex/_doc/1 { "desc": "Imported text" } - Conduct search based on the synonym Keyword and view the search results.
Run the following command to search for Keyword:
GET /myindex/_search { "query": { "match": { "desc": "Keyword" } } }Search result
(Versions earlier than 7.x)
{ "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.41048482, "hits": [ { "_index": "myindex", "_type": "mytype", "_id": "1", "_score": 0.41048482, "_source": { "desc": "Imported text" } } ] } }(Version 7.X and later versions)
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.1519955, "hits" : [ { "_index" : "myindex", "_type" : "_doc", "_id" : "1", "_score" : 0.1519955, "_source" : { "desc" : "Imported text" } } ] } }
- Create an index.
Last Article: Configuring a Custom Word Dictionary
Next Article: Simplified-Traditional Chinese Conversion Plugin
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.