Configuring a Custom Word Dictionary
When using search engines, certain special Chinese terms, can be recognized during word segmentation.
CSS provides the custom word dictionary function to complete word segmentation in the preceding scenarios. Hot updates of your custom word dictionary are supported. Specifically, the custom word dictionary can take effect without having to restart he cluster.
You cannot use the custom word dictionary function for clusters that were created before March 10, 2018 (the launch time of the function).
Basic Concepts
- Main word dictionary: Main words are the words on which users want to perform word segmentation. The main word dictionary is a collection of the main words. The main word dictionary file must be a text file encoded using UTF-8 without BOM, with one subword per line. The maximum size of a main word dictionary file is 100 MB.
- Stop word dictionary: Stop words are the words which users can ignore. A stop word dictionary is a collection of stop words. The stop word dictionary file must be a text file encoded using UTF-8 without BOM, with one subword per line. The maximum size of a stop word dictionary file is 20 MB.
- Synonym dictionary: Synonyms are words with the same meaning. A synonym dictionary is a collection of synonyms. The synonym dictionary file must be a text file encoded using UTF-8 without BOM, with a pair of comma-separated synonyms per line. The maximum size of a synonym dictionary file is 20 MB.
Prerequisites
To use the custom word dictionary, the account or IAM user used for logging in to the CSS management console must have both of the following permissions:
- Tenant Administrator for project OBS in region Global service
- Elasticsearch Administrator in the current region
Configuring a Custom Word Dictionary
- In the left navigation pane of the CSS management console, click Clusters.
- On the Clusters page that is displayed, click the name of the target cluster.
- On the displayed page, click Custom Word Dictionary.
- On the displayed Custom Word Dictionary page, set the switch to enable or disable the custom word library function.
- OBS Bucket: indicates the OBS bucket where the main word dictionary file, stop word dictionary file, and synonym dictionary file are stored. If no OBS bucket is available, click Create Bucket to create one. For details, see Creating a Bucket. The OBS bucket to be created must be in the same region as the cluster.
- Main Word Dictionary: indicates the main word dictionary file. Currently, only text files encoded using UTF-8 without BOM are supported. The main word dictionary file must be stored in the corresponding OBS path.
- Stop Word Dictionary: indicates the stop word dictionary file. Currently, only text files encoded using UTF-8 without BOM are supported. The stop word dictionary file must be stored in the corresponding OBS path.
- Synonym Word Dictionary: indicates the synonym dictionary file. Currently, only text files encoded using UTF-8 without BOM are supported. The synonym dictionary file must be stored in the corresponding OBS path.
Figure 1 Configuring a custom word dictionary
- Click Save. In the displayed Confirm dialog box, click OK. The word dictionary information is displayed in the lower part of the page. In this case, the word dictionary status is Updating. Wait about 1 minute. After the word dictionary configuration is complete, the word dictionary status changes to Succeeded. In this case, the configured word dictionary has taken effect in the cluster.
Figure 2 Word dictionary information
Modifying the Custom Word Dictionary
You can modify the parameters of your configured custom word dictionary as required. You need to upload the target word dictionary files to the corresponding OBS bucket in advance.
On the Custom Word Dictionary page, modify OBS Bucket, Main Word Dictionary, Stop Word Dictionary, or Synonym Word Dictionary, and click Save. Click OK in the dialog box that is displayed. After the custom word dictionary is modified, its status changes to Succeeded.
Deleting a Custom Word Dictionary
You can delete your custom word dictionary as required to release resources.
On the Custom Word Dictionary page, click
. In the displayed dialog box, click OK. The following figure shows the Custom Word Dictionary page displayed after your configured custom word dictionary is deleted.
Last Article: Customizing Word Dictionaries
Next Article: Example
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.