Updated on 2025-08-15 GMT+08:00

Loading Custom Word Dictionaries

Function

You can configure custom word dictionaries to support word segmentation. This gives the search engine enhanced performance when searching by keywords such as company names, for example, Huawei, and buzzwords from social media. You can also search text data based on a synonym dictionary. CSS uses the IK and synonym analyzers. The IK analyzer uses a main word dictionary and a stop word dictionary. The synonym analyzer uses a synonym word dictionary. The IK analyzer uses the ik_max_word and ik_smart word segmentation policies. The synonym analyzer uses the ik_synonym word segmentation policy. This API is used to load a custom word dictionary stored in OBS. When the preset word dictionaries are inadequate for tokenization, you can use custom word dictionaries.

Calling Method

For details, see Calling APIs.

URI

POST /v1.0/{project_id}/clusters/{cluster_id}/thesaurus

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition:

Project ID. For details about how to obtain the project ID and name, see Obtaining the Project ID and Name.

Constraints:

N/A

Value range:

Project ID of the account.

Default value:

N/A

cluster_id

Yes

String

Definition:

ID of the cluster where a custom word dictionary you want to configure. For details about how to obtain the cluster ID, see Obtaining the Cluster ID.

Constraints:

N/A

Value range:

Cluster ID.

Default value:

N/A

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

bucket_name

Yes

String

Definition:

OBS bucket where the word dictionary file is stored.

Constraints:

The storage class of the bucket must be standard or infrequently accessed. Archive storage is not supported.

Value range:

N/A

Default value:

N/A

main_object

No

String

Definition:

Main word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one word. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged.

Value range:

N/A

Default value:

N/A

stop_object

No

String

Definition:

Stop word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one word. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged.

Value range:

N/A

Default value:

N/A

synonym_object

No

String

Definition:

Synonym dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one word. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged.

Value range:

N/A

Default value:

N/A

static_main_object

No

String

Definition:

Static main word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one group of related words. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Only new clusters created after this word dictionary function was brought online are supported.

Value range:

N/A

Default value:

N/A

static_stop_object

No

String

Definition:

Static stop word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one group of related words. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Only new clusters created after this word dictionary function was brought online are supported.

Value range:

N/A

Default value:

N/A

extra_main_object

No

String

Definition:

Extra main word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one group of related words. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Only new clusters created after this word dictionary function was brought online are supported.

Value range:

N/A

Default value:

N/A

extra_stop_object

No

String

Definition:

Extra stop word dictionary file.

Constraints:

  • Must be a text file encoded in UTF-8 without BOM. Each line contains one group of related words. The maximum file size is 100 MB.

  • Modify the parameters of at least one of the seven word dictionaries. Note: Passing an empty "" character string will clear the word dictionary. Passing nothing or null will leave the word dictionary unchanged. Only new clusters created after this word dictionary function was brought online are supported.

Value range:

N/A

Default value:

N/A

Response Parameters

Status code: 200

Request succeeded.

None

Example Requests

Enable and configure the word dictionary.

POST https://{Endpoint}/v1.0/{project_id}/clusters/4f3deec3-efa8-4598-bf91-560aad1377a3/thesaurus

{
  "bucket_name" : "test-bucket",
  "main_object" : "word/main.txt",
  "stop_object" : "word/stop.txt",
  "synonym_object" : "word/synonym.txt",
  "static_main_object" : "word/staticMain.txt",
  "static_stop_object" : "word/staticStop.txt",
  "extra_main_object" : "word/extraMain.txt",
  "extra_stop_object" : "word/extraStop.txt"
}

Example Responses

None

SDK Sample Code

The SDK sample code is as follows.

Java

Enable and configure the word dictionary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
package com.huaweicloud.sdk.test;

import com.huaweicloud.sdk.core.auth.ICredential;
import com.huaweicloud.sdk.core.auth.BasicCredentials;
import com.huaweicloud.sdk.core.exception.ConnectionException;
import com.huaweicloud.sdk.core.exception.RequestTimeoutException;
import com.huaweicloud.sdk.core.exception.ServiceResponseException;
import com.huaweicloud.sdk.css.v1.region.CssRegion;
import com.huaweicloud.sdk.css.v1.*;
import com.huaweicloud.sdk.css.v1.model.*;


public class CreateLoadIkThesaurusSolution {

    public static void main(String[] args) {
        // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
        // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
        String ak = System.getenv("CLOUD_SDK_AK");
        String sk = System.getenv("CLOUD_SDK_SK");
        String projectId = "{project_id}";

        ICredential auth = new BasicCredentials()
                .withProjectId(projectId)
                .withAk(ak)
                .withSk(sk);

        CssClient client = CssClient.newBuilder()
                .withCredential(auth)
                .withRegion(CssRegion.valueOf("<YOUR REGION>"))
                .build();
        CreateLoadIkThesaurusRequest request = new CreateLoadIkThesaurusRequest();
        request.withClusterId("{cluster_id}");
        LoadCustomThesaurusReq body = new LoadCustomThesaurusReq();
        body.withExtraStopObject("word/extraStop.txt");
        body.withExtraMainObject("word/extraMain.txt");
        body.withStaticStopObject("word/staticStop.txt");
        body.withStaticMainObject("word/staticMain.txt");
        body.withSynonymObject("word/synonym.txt");
        body.withStopObject("word/stop.txt");
        body.withMainObject("word/main.txt");
        body.withBucketName("test-bucket");
        request.withBody(body);
        try {
            CreateLoadIkThesaurusResponse response = client.createLoadIkThesaurus(request);
            System.out.println(response.toString());
        } catch (ConnectionException e) {
            e.printStackTrace();
        } catch (RequestTimeoutException e) {
            e.printStackTrace();
        } catch (ServiceResponseException e) {
            e.printStackTrace();
            System.out.println(e.getHttpStatusCode());
            System.out.println(e.getRequestId());
            System.out.println(e.getErrorCode());
            System.out.println(e.getErrorMsg());
        }
    }
}

Python

Enable and configure the word dictionary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# coding: utf-8

import os
from huaweicloudsdkcore.auth.credentials import BasicCredentials
from huaweicloudsdkcss.v1.region.css_region import CssRegion
from huaweicloudsdkcore.exceptions import exceptions
from huaweicloudsdkcss.v1 import *

if __name__ == "__main__":
    # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak = os.environ["CLOUD_SDK_AK"]
    sk = os.environ["CLOUD_SDK_SK"]
    projectId = "{project_id}"

    credentials = BasicCredentials(ak, sk, projectId)

    client = CssClient.new_builder() \
        .with_credentials(credentials) \
        .with_region(CssRegion.value_of("<YOUR REGION>")) \
        .build()

    try:
        request = CreateLoadIkThesaurusRequest()
        request.cluster_id = "{cluster_id}"
        request.body = LoadCustomThesaurusReq(
            extra_stop_object="word/extraStop.txt",
            extra_main_object="word/extraMain.txt",
            static_stop_object="word/staticStop.txt",
            static_main_object="word/staticMain.txt",
            synonym_object="word/synonym.txt",
            stop_object="word/stop.txt",
            main_object="word/main.txt",
            bucket_name="test-bucket"
        )
        response = client.create_load_ik_thesaurus(request)
        print(response)
    except exceptions.ClientRequestException as e:
        print(e.status_code)
        print(e.request_id)
        print(e.error_code)
        print(e.error_msg)

Go

Enable and configure the word dictionary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
package main

import (
	"fmt"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic"
    css "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1/model"
    region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1/region"
)

func main() {
    // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak := os.Getenv("CLOUD_SDK_AK")
    sk := os.Getenv("CLOUD_SDK_SK")
    projectId := "{project_id}"

    auth := basic.NewCredentialsBuilder().
        WithAk(ak).
        WithSk(sk).
        WithProjectId(projectId).
        Build()

    client := css.NewCssClient(
        css.CssClientBuilder().
            WithRegion(region.ValueOf("<YOUR REGION>")).
            WithCredential(auth).
            Build())

    request := &model.CreateLoadIkThesaurusRequest{}
	request.ClusterId = "{cluster_id}"
	extraStopObjectLoadCustomThesaurusReq:= "word/extraStop.txt"
	extraMainObjectLoadCustomThesaurusReq:= "word/extraMain.txt"
	staticStopObjectLoadCustomThesaurusReq:= "word/staticStop.txt"
	staticMainObjectLoadCustomThesaurusReq:= "word/staticMain.txt"
	synonymObjectLoadCustomThesaurusReq:= "word/synonym.txt"
	stopObjectLoadCustomThesaurusReq:= "word/stop.txt"
	mainObjectLoadCustomThesaurusReq:= "word/main.txt"
	request.Body = &model.LoadCustomThesaurusReq{
		ExtraStopObject: &extraStopObjectLoadCustomThesaurusReq,
		ExtraMainObject: &extraMainObjectLoadCustomThesaurusReq,
		StaticStopObject: &staticStopObjectLoadCustomThesaurusReq,
		StaticMainObject: &staticMainObjectLoadCustomThesaurusReq,
		SynonymObject: &synonymObjectLoadCustomThesaurusReq,
		StopObject: &stopObjectLoadCustomThesaurusReq,
		MainObject: &mainObjectLoadCustomThesaurusReq,
		BucketName: "test-bucket",
	}
	response, err := client.CreateLoadIkThesaurus(request)
	if err == nil {
        fmt.Printf("%+v\n", response)
    } else {
        fmt.Println(err)
    }
}

More

For SDK sample code of more programming languages, see the Sample Code tab in API Explorer. SDK sample code can be automatically generated.

Status Codes

Status Code

Description

200

Request succeeded.

403

Request rejected.

The server has received the request and understood it, but refused to respond to it. The client should not repeat the request without modifications.

500

The server is able to receive the request but unable to understand the request.

Error Codes

See Error Codes.