Updated on 2025-11-28 GMT+08:00

Loading Custom Word Dictionaries

Function

You can configure custom word dictionaries to support word segmentation. This gives the search engine enhanced performance when searching by keywords such as company names, for example, Huawei, and buzzwords from social media. You can also search text data based on a synonym dictionary. CSS uses the IK and synonym analyzers. The IK analyzer uses a main word dictionary and a stop word dictionary. The synonym analyzer uses a synonym word dictionary. The IK analyzer uses the ik_max_word and ik_smart word segmentation policies. The synonym analyzer uses the ik_synonym word segmentation policy. This API is used to load a custom word dictionary stored in OBS. When the preset word dictionaries are inadequate for tokenization, you can use custom word dictionaries.

Calling Method

For details, see Calling APIs.

URI

POST /v1.0/{project_id}/clusters/{cluster_id}/thesaurus

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition:

Project ID. For details about how to obtain the project ID and name, see Obtaining the Project ID and Name.

Constraints:

N/A

Value range:

Project ID of the account.

Default value:

N/A

cluster_id

Yes

String

Definition:

ID of the cluster where a custom word dictionary you want to configure. For details about how to obtain the cluster ID, see Obtaining the Cluster ID.

Constraints:

N/A

Value range:

Cluster ID.

Default value:

N/A

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

bucket_name

Yes

String

Definition:

OBS bucket where the word dictionary file is stored.

Constraints:

The storage class of the bucket must be standard or infrequently accessed. Archive storage is not supported.

Value range:

N/A

Default value:

N/A

main_object

No

String

Definition:

Main word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one word. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged.

Value range:

N/A

Default value:

N/A

stop_object

No

String

Definition:

Stop word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one word. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged.

Value range:

N/A

Default value:

N/A

synonym_object

No

String

Definition:

Synonym dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one word. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged.

Value range:

N/A

Default value:

N/A

static_main_object

No

String

Definition:

Static main word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one group of related words. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Only new clusters created after this word dictionary function was brought online are supported.

Value range:

N/A

Default value:

N/A

static_stop_object

No

String

Definition:

Static stop word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one group of related words. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Only new clusters created after this word dictionary function was brought online are supported.

Value range:

N/A

Default value:

N/A

extra_main_object

No

String

Definition:

Extra main word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one group of related words. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Only new clusters created after this word dictionary function was brought online are supported.

Value range:

N/A

Default value:

N/A

extra_stop_object

No

String

Definition:

Extra stop word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one group of related words. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Only new clusters created after this word dictionary function was brought online are supported.

Value range:

N/A

Default value:

N/A

Response Parameters

Status code: 200

Request succeeded.

None

Example Requests

Enable and configure the word dictionary.

POST https://{Endpoint}/v1.0/{project_id}/clusters/4f3deec3-efa8-4598-bf91-560aad1377a3/thesaurus

{
  "bucket_name" : "test-bucket",
  "main_object" : "word/main.txt",
  "stop_object" : "word/stop.txt",
  "synonym_object" : "word/synonym.txt",
  "static_main_object" : "word/staticMain.txt",
  "static_stop_object" : "word/staticStop.txt",
  "extra_main_object" : "word/extraMain.txt",
  "extra_stop_object" : "word/extraStop.txt"
}

Example Responses

None

Status Codes

Status Code

Description

200

Request succeeded.

403

Request rejected.

The server has received the request and understood it, but refused to respond to it. The client should not repeat the request without modifications.

500

The server is able to receive the request but unable to understand the request.

Error Codes

See Error Codes.