Help Center/ Cloud Search Service/ User Guide/ Using Elasticsearch for Data Search/ Managing Elasticsearch Clusters/ Switching Between Simplified and Traditional Chinese for Data Search in an Elasticsearch Cluster
Updated on 2024-10-26 GMT+08:00

Switching Between Simplified and Traditional Chinese for Data Search in an Elasticsearch Cluster

This topic describes how to switch between simplified and traditional Chinese for data search in an Elasticsearch cluster.

Scenario

The simplified-traditional Chinese conversion plugin converts between simplified and traditional Chinese. With this plugin, you can search index data containing the corresponding simplified Chinese based on the traditional Chinese keyword, and vice versa.

This plugin is installed by default. You do not need to install it by yourself.

The simplified-traditional Chinese conversion plugin can be used as the analyzer, tokenizer, token-filter, or char-filter.

The simplified-traditional Chinese conversion plugin provides the following two conversion types:

  • s2t: converts simplified Chinese to traditional Chinese.
  • t2s: converts traditional Chinese to simplified Chinese.

Switching Between Simplified and Traditional Chinese for Data Search

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, click Clusters to switch to the Clusters page.
  3. In the cluster list, locate the row containing the cluster and click Access Kibana in the Operation column.

    If the target cluster has the security mode enabled, enter the username and password you set when you created the cluster.

  4. In the Kibana navigation pane on the left, choose Dev Tools.
  5. On the Console page, run the following command to create index stconvert and specify a user-defined mapping to define the data type:
    Sample code for Elasticsearch clusters earlier than 7.x:
    PUT /stconvert
    {
        "settings": {
            "number_of_shards": 1,
            "number_of_replicas": 0,
            "analysis": {
                "analyzer": {
                    "ts_ik": {
                        "tokenizer": "ik_smart",
                        "char_filter": [
                            "tsconvert",
                            "stconvert"
                        ]
                    }
                },
                "char_filter": {
                    "tsconvert": {
                        "type": "stconvert",
                        "convert_type": "t2s"
                    },
                    "stconvert": {
                        "type": "stconvert",
                        "convert_type": "s2t"
                    }
                }
            }
        },
        "mappings": {
            "type": {
                "properties": {
                    "desc": {
                        "type": "text",
                        "analyzer": "ts_ik"
                    }
                }
            }
        }
    }

    Sample code for Elasticsearch 7.x or later and OpenSearch:

    PUT /stconvert
    {
        "settings": {
            "number_of_shards": 1,
            "number_of_replicas": 0,
            "analysis": {
                "analyzer": {
                    "ts_ik": {
                        "tokenizer": "ik_smart",
                        "char_filter": [
                            "tsconvert",
                            "stconvert"
                        ]
                    }
                },
                "char_filter": {
                    "tsconvert": {
                        "type": "stconvert",
                        "convert_type": "t2s"
                    },
                    "stconvert": {
                        "type": "stconvert",
                        "convert_type": "s2t"
                    }
                }
            }
        },
        "mappings": {
                     "properties": {
                    "desc": {
                        "type": "text",
                        "analyzer": "ts_ik"
                    }
                }
              }
    }

    The command output is similar to the following:

    {
      "acknowledged" : true,
      "shards_acknowledged" : true,
      "index" : "stconvert"
    }
  6. On the Console page, run the following command to import data to index stconvert:
    Sample code for Elasticsearch clusters earlier than 7.x:
    POST /stconvert/type/1
    {
      "desc": "Text in traditional Chinese"
    }

    Sample code for Elasticsearch 7.x or later and OpenSearch:

    POST /stconvert/_doc/1
    {
      "desc": "Text in traditional Chinese"
    }

    If the value of failed in the command output is 0, the data is imported successfully.

  7. On the Console page, run the following command to search for the keyword and view the search result:
    GET /stconvert/_search
    {
        "query": {
            "match": {
                "desc": "Keyword"
            }
        }
    }

    The command output is similar to the following:

    {
      "took" : 15,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 1,
        "max_score" : 0.5753642,
        "hits" : [
          {
            "_index" : "stconvert",
            "_type" : "type",
            "_id" : "1",
            "_score" : 0.5753642,
            "_source" : {
              "desc": "Text in traditional Chinese"
            }
          }
        ]
      }
    }