Updated on 2024-04-19 GMT+08:00

Example

Application Scenarios

Configure a custom word dictionary for the cluster, set main words, stop words, and synonyms. Search for the target text by keyword and synonym and view the search results.

Step 1: Configure a Custom Word Dictionary

  1. Prepare a word dictionary file (a text file encoded using UTF-8 without BOM) and upload it to the target OBS path.

    Set the main word dictionary file, stop word dictionary file, and synonym word dictionary file.

    The default word dictionary contains common stop words such as are and the. You do not need to upload such stop words.

  2. Log in to the CSS management console.
  3. In the navigation pane, choose Clusters > OpenSearch.
  4. On the Clusters page, click the name of the target cluster.
  5. Click the Word Dictionaries tab. Configure the word dictionary file for the step 1 by referring to Managing Word Dictionaries.
  6. After the word dictionary takes effect, return to the cluster list. Locate the target cluster and click Kibana in the Operation column to access the cluster.
  7. On the Kibana page, click Dev Tools in the navigation tree on the left. The operation page is displayed.
  8. Run the following commands to check the performance of different word segmentation policies.
    • Use the ik_smart word segmentation policy to split the target text.
      Example code:
      POST /_analyze
      {
        "analyzer":"ik_smart",
        "text":"Text used for word segmentation"
      }

      After the operation is completed, view the word segmentation result.

      {
        "tokens": [
          {
            "token": "word-1",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 0
          },
          {
            "token": "word-2",
            "start_offset": 5,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 1
          }
        ]
      }
    • Use the ik_max_word word segmentation policy to split the target text.

      Example code:

      POST /_analyze
      {
        "analyzer":"ik_max_word",
        "text":"Text used for word segmentation"
      }

      After the operation is completed, view the word segmentation result.

      {
        "tokens" : [
          {
            "token": "word-1",
            "start_offset" : 0,
            "end_offset" : 4,
            "type" : "CN_WORD",
            "position" : 0
          },
          {
            "token" : "word-3",
            "start_offset" : 0,
            "end_offset" : 2,
            "type" : "CN_WORD",
            "position" : 1
          },
          {
            "token" : "word-4",
            "start_offset" : 0,
            "end_offset" : 1,
            "type" : "CN_WORD",
            "position" : 2
          },
          {
            "token" : "word-5",
            "start_offset" : 1,
            "end_offset" : 3,
            "type" : "CN_WORD",
            "position" : 3
          },
          {
            "token" : "word-6",
            "start_offset" : 2,
            "end_offset" : 4,
            "type" : "CN_WORD",
            "position" : 4
          },
          {
            "token" : "word-7",
            "start_offset" : 3,
            "end_offset" : 4,
            "type" : "CN_WORD",
            "position" : 5
          },
          {
            "token" : "word-2",
            "start_offset" : 5,
            "end_offset" : 8,
            "type" : "CN_WORD",
            "position" : 6
          },
          {
            "token" : "word-8",
            "start_offset" : 5,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 7
          },
          {
            "token" : "word-9",
            "start_offset" : 6,
            "end_offset" : 8,
            "type" : "CN_WORD",
            "position" : 8
          },
          {
            "token" : "word-10",
            "start_offset" : 7,
            "end_offset" : 8,
            "type" : "CN_WORD",
            "position" : 9
          }
        ]
      }

Step 2: Use Keywords for Search

  1. Create the book index and configure the word segmentation policy.

    In this example, both analyzer and search_analyzer are set to ik_max_word. You can also use ik_smart.

    PUT /book
    {
        "settings": {
            "number_of_shards": 2,
            "number_of_replicas": 1
        },
        "mappings": {
            "properties": {
                "content": {
                    "type": "text",
                    "analyzer": "ik_max_word",
                    "search_analyzer": "ik_max_word"
                }
            }
        }
    }
  2. Import the text information to the book index.
    PUT /book/_doc/1 
    { 
      "content":"Imported text"
    }
  3. Use a keyword to search for the text and view the search results.
    GET /book/_doc/_search
    {
      "query": {
        "match": {
          "content": "Keyword"
        }
      }
    }

    Search result

    {
      "took" : 16,
      "timed_out" : false,
      "_shards" : {
        "total" : 2,
        "successful" : 2,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 1.7260926,
        "hits" : [
          {
            "_index" : "book",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.7260926,
            "_source" : {
              "content" : "Imported text"
            }
          }
        ]
      }
    }

Step 3: Use Synonyms for Search

  1. Create the myindex index and configure the word segmentation policy.
    PUT myindex
    {
        "settings": {
            "analysis": {
                "filter": {
                    "my_synonym": {
                        "type": "dynamic_synonym"
                    }
                },
                "analyzer": {
                    "ik_synonym": {
                        "filter": [
                            "my_synonym"
                        ],
                        "type": "custom",
                        "tokenizer": "ik_smart"
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "desc": {
                    "type": "text",
                    "analyzer": "ik_synonym"
                }
            }
        }
    }
  2. Import the text information to the myindex index.
    PUT /myindex/_doc/1
    {
      "desc": "Imported text"
    }
  3. Conduct search based on the synonym and view the search results.
    GET /myindex/_search
    {
      "query": {
        "match": {
          "desc": "Keyword"
        }
      }
    }

    Search result

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 0.1519955,
        "hits" : [
          {
            "_index" : "myindex",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.1519955,
            "_source" : {
              "desc" : "Imported text"
            }
          }
        ]
      }
    }