Loading Custom Word Dictionaries
Function
You can configure custom word dictionaries to support word segmentation. This gives the search engine enhanced performance when searching by keywords such as company names, for example, Huawei, and buzzwords from social media. You can also search text data based on a synonym dictionary. CSS uses the IK and synonym analyzers. The IK analyzer uses a main word dictionary and a stop word dictionary. The synonym analyzer uses a synonym word dictionary. The IK analyzer uses the ik_max_word and ik_smart word segmentation policies. The synonym analyzer uses the ik_synonym word segmentation policy. This API is used to load a custom word dictionary stored in OBS. When the preset word dictionaries are inadequate for tokenization, you can use custom word dictionaries.
Calling Method
For details, see Calling APIs.
URI
POST /v1.0/{project_id}/clusters/{cluster_id}/thesaurus
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Definition: Project ID. For details about how to obtain the project ID and name, see Obtaining the Project ID and Name. Constraints: N/A Value range: Project ID of the account. Default value: N/A |
cluster_id |
Yes |
String |
Definition: ID of the cluster where a custom word dictionary you want to configure. For details about how to obtain the cluster ID, see Obtaining the Cluster ID. Constraints: N/A Value range: Cluster ID. Default value: N/A |
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
bucket_name |
Yes |
String |
Definition: OBS bucket where the word dictionary file is stored. Constraints: The storage class of the bucket must be standard or infrequently accessed. Archive storage is not supported. Value range: N/A Default value: N/A |
main_object |
No |
String |
Definition: Main word dictionary file. Constraints:
Value range: N/A Default value: N/A |
stop_object |
No |
String |
Definition: Stop word dictionary file. Constraints:
Value range: N/A Default value: N/A |
synonym_object |
No |
String |
Definition: Synonym dictionary file. Constraints:
Value range: N/A Default value: N/A |
static_main_object |
No |
String |
Definition: Static main word dictionary file. Constraints:
Value range: N/A Default value: N/A |
static_stop_object |
No |
String |
Definition: Static stop word dictionary file. Constraints:
Value range: N/A Default value: N/A |
extra_main_object |
No |
String |
Definition: Extra main word dictionary file. Constraints:
Value range: N/A Default value: N/A |
extra_stop_object |
No |
String |
Definition: Extra stop word dictionary file. Constraints:
Value range: N/A Default value: N/A |
Response Parameters
Status code: 200
Request succeeded.
None
Example Requests
Enable and configure the word dictionary.
POST https://{Endpoint}/v1.0/{project_id}/clusters/4f3deec3-efa8-4598-bf91-560aad1377a3/thesaurus { "bucket_name" : "test-bucket", "main_object" : "word/main.txt", "stop_object" : "word/stop.txt", "synonym_object" : "word/synonym.txt", "static_main_object" : "word/staticMain.txt", "static_stop_object" : "word/staticStop.txt", "extra_main_object" : "word/extraMain.txt", "extra_stop_object" : "word/extraStop.txt" }
Example Responses
None
SDK Sample Code
The SDK sample code is as follows.
Java
Enable and configure the word dictionary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
package com.huaweicloud.sdk.test; import com.huaweicloud.sdk.core.auth.ICredential; import com.huaweicloud.sdk.core.auth.BasicCredentials; import com.huaweicloud.sdk.core.exception.ConnectionException; import com.huaweicloud.sdk.core.exception.RequestTimeoutException; import com.huaweicloud.sdk.core.exception.ServiceResponseException; import com.huaweicloud.sdk.css.v1.region.CssRegion; import com.huaweicloud.sdk.css.v1.*; import com.huaweicloud.sdk.css.v1.model.*; public class CreateLoadIkThesaurusSolution { public static void main(String[] args) { // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment String ak = System.getenv("CLOUD_SDK_AK"); String sk = System.getenv("CLOUD_SDK_SK"); String projectId = "{project_id}"; ICredential auth = new BasicCredentials() .withProjectId(projectId) .withAk(ak) .withSk(sk); CssClient client = CssClient.newBuilder() .withCredential(auth) .withRegion(CssRegion.valueOf("<YOUR REGION>")) .build(); CreateLoadIkThesaurusRequest request = new CreateLoadIkThesaurusRequest(); request.withClusterId("{cluster_id}"); LoadCustomThesaurusReq body = new LoadCustomThesaurusReq(); body.withExtraStopObject("word/extraStop.txt"); body.withExtraMainObject("word/extraMain.txt"); body.withStaticStopObject("word/staticStop.txt"); body.withStaticMainObject("word/staticMain.txt"); body.withSynonymObject("word/synonym.txt"); body.withStopObject("word/stop.txt"); body.withMainObject("word/main.txt"); body.withBucketName("test-bucket"); request.withBody(body); try { CreateLoadIkThesaurusResponse response = client.createLoadIkThesaurus(request); System.out.println(response.toString()); } catch (ConnectionException e) { e.printStackTrace(); } catch (RequestTimeoutException e) { e.printStackTrace(); } catch (ServiceResponseException e) { e.printStackTrace(); System.out.println(e.getHttpStatusCode()); System.out.println(e.getRequestId()); System.out.println(e.getErrorCode()); System.out.println(e.getErrorMsg()); } } } |
Python
Enable and configure the word dictionary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
# coding: utf-8 import os from huaweicloudsdkcore.auth.credentials import BasicCredentials from huaweicloudsdkcss.v1.region.css_region import CssRegion from huaweicloudsdkcore.exceptions import exceptions from huaweicloudsdkcss.v1 import * if __name__ == "__main__": # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment ak = os.environ["CLOUD_SDK_AK"] sk = os.environ["CLOUD_SDK_SK"] projectId = "{project_id}" credentials = BasicCredentials(ak, sk, projectId) client = CssClient.new_builder() \ .with_credentials(credentials) \ .with_region(CssRegion.value_of("<YOUR REGION>")) \ .build() try: request = CreateLoadIkThesaurusRequest() request.cluster_id = "{cluster_id}" request.body = LoadCustomThesaurusReq( extra_stop_object="word/extraStop.txt", extra_main_object="word/extraMain.txt", static_stop_object="word/staticStop.txt", static_main_object="word/staticMain.txt", synonym_object="word/synonym.txt", stop_object="word/stop.txt", main_object="word/main.txt", bucket_name="test-bucket" ) response = client.create_load_ik_thesaurus(request) print(response) except exceptions.ClientRequestException as e: print(e.status_code) print(e.request_id) print(e.error_code) print(e.error_msg) |
Go
Enable and configure the word dictionary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
package main import ( "fmt" "github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic" css "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1" "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1/model" region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1/region" ) func main() { // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment ak := os.Getenv("CLOUD_SDK_AK") sk := os.Getenv("CLOUD_SDK_SK") projectId := "{project_id}" auth := basic.NewCredentialsBuilder(). WithAk(ak). WithSk(sk). WithProjectId(projectId). Build() client := css.NewCssClient( css.CssClientBuilder(). WithRegion(region.ValueOf("<YOUR REGION>")). WithCredential(auth). Build()) request := &model.CreateLoadIkThesaurusRequest{} request.ClusterId = "{cluster_id}" extraStopObjectLoadCustomThesaurusReq:= "word/extraStop.txt" extraMainObjectLoadCustomThesaurusReq:= "word/extraMain.txt" staticStopObjectLoadCustomThesaurusReq:= "word/staticStop.txt" staticMainObjectLoadCustomThesaurusReq:= "word/staticMain.txt" synonymObjectLoadCustomThesaurusReq:= "word/synonym.txt" stopObjectLoadCustomThesaurusReq:= "word/stop.txt" mainObjectLoadCustomThesaurusReq:= "word/main.txt" request.Body = &model.LoadCustomThesaurusReq{ ExtraStopObject: &extraStopObjectLoadCustomThesaurusReq, ExtraMainObject: &extraMainObjectLoadCustomThesaurusReq, StaticStopObject: &staticStopObjectLoadCustomThesaurusReq, StaticMainObject: &staticMainObjectLoadCustomThesaurusReq, SynonymObject: &synonymObjectLoadCustomThesaurusReq, StopObject: &stopObjectLoadCustomThesaurusReq, MainObject: &mainObjectLoadCustomThesaurusReq, BucketName: "test-bucket", } response, err := client.CreateLoadIkThesaurus(request) if err == nil { fmt.Printf("%+v\n", response) } else { fmt.Println(err) } } |
More
For SDK sample code of more programming languages, see the Sample Code tab in API Explorer. SDK sample code can be automatically generated.
Status Codes
Status Code |
Description |
---|---|
200 |
Request succeeded. |
403 |
Request rejected. The server has received the request and understood it, but refused to respond to it. The client should not repeat the request without modifications. |
500 |
The server is able to receive the request but unable to understand the request. |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot