加载自定义词库
功能介绍
云搜索服务的词库用于对文本进行分词,使得一些特殊词语在分词的时候能够被识别出来,便于根据关键词搜索文本数据。例如,根据公司名称来查询,如“华为”;或者根据网络流行词来查询,如“喜大普奔”。也支持基于同义词词库,根据同义词搜索文本数据。CSS服务使用的分词器包括IK分词器和同义词分词器。IK分词器配备主词词库和停词词库;同义词分词器配备同义词词库。其中,IK分词器包含ik_max_word和ik_smart分词策略。同义词分词器使用的是ik_synonym分词策略。当预置词库不满足集群业务分词需求时,可以使用自定义词库,该接口用于加载存放于OBS的自定义词库。
调用方法
请参见如何调用API。
URI
POST /v1.0/{project_id}/clusters/{cluster_id}/thesaurus
请求参数
参数 |
是否必选 |
参数类型 |
描述 |
---|---|---|---|
bucket_name |
是 |
String |
参数解释: 词库文件存放的OBS桶。 约束限制: 桶类型必须为标准存储或者低频存储,不支持归档存储。 取值范围: 不涉及 默认取值: 不涉及 |
main_object |
否 |
String |
参数解释: 主词词库文件对象。 约束限制:
取值范围: 不涉及 默认取值: 不涉及 |
stop_object |
否 |
String |
参数解释: 停词词库文件对象。 约束限制:
取值范围: 不涉及 默认取值: 不涉及 |
synonym_object |
否 |
String |
参数解释: 同义词词库文件。 约束限制:
取值范围: 不涉及 默认取值: 不涉及 |
static_main_object |
否 |
String |
参数解释: 静态主词词库文件。 约束限制:
取值范围: 不涉及 默认取值: 不涉及 |
static_stop_object |
否 |
String |
参数解释: 静态停词词库文件。 约束限制:
取值范围: 不涉及 默认取值: 不涉及 |
extra_main_object |
否 |
String |
参数解释: Extra主词词库文件。 约束限制:
取值范围: 不涉及 默认取值: 不涉及 |
extra_stop_object |
否 |
String |
参数解释: Extra停词词库文件。 约束限制:
取值范围: 不涉及 默认取值: 不涉及 |
响应参数
状态码:200
请求已成功。
无
请求示例
开启并配置词库信息。
POST https://{Endpoint}/v1.0/{project_id}/clusters/4f3deec3-efa8-4598-bf91-560aad1377a3/thesaurus { "bucket_name" : "test-bucket", "main_object" : "word/main.txt", "stop_object" : "word/stop.txt", "synonym_object" : "word/synonym.txt", "static_main_object" : "word/staticMain.txt", "static_stop_object" : "word/staticStop.txt", "extra_main_object" : "word/extraMain.txt", "extra_stop_object" : "word/extraStop.txt" }
响应示例
无
SDK代码示例
SDK代码示例如下。
开启并配置词库信息。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
package com.huaweicloud.sdk.test; import com.huaweicloud.sdk.core.auth.ICredential; import com.huaweicloud.sdk.core.auth.BasicCredentials; import com.huaweicloud.sdk.core.exception.ConnectionException; import com.huaweicloud.sdk.core.exception.RequestTimeoutException; import com.huaweicloud.sdk.core.exception.ServiceResponseException; import com.huaweicloud.sdk.css.v1.region.CssRegion; import com.huaweicloud.sdk.css.v1.*; import com.huaweicloud.sdk.css.v1.model.*; public class CreateLoadIkThesaurusSolution { public static void main(String[] args) { // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment String ak = System.getenv("CLOUD_SDK_AK"); String sk = System.getenv("CLOUD_SDK_SK"); String projectId = "{project_id}"; ICredential auth = new BasicCredentials() .withProjectId(projectId) .withAk(ak) .withSk(sk); CssClient client = CssClient.newBuilder() .withCredential(auth) .withRegion(CssRegion.valueOf("<YOUR REGION>")) .build(); CreateLoadIkThesaurusRequest request = new CreateLoadIkThesaurusRequest(); request.withClusterId("{cluster_id}"); LoadCustomThesaurusReq body = new LoadCustomThesaurusReq(); body.withExtraStopObject("word/extraStop.txt"); body.withExtraMainObject("word/extraMain.txt"); body.withStaticStopObject("word/staticStop.txt"); body.withStaticMainObject("word/staticMain.txt"); body.withSynonymObject("word/synonym.txt"); body.withStopObject("word/stop.txt"); body.withMainObject("word/main.txt"); body.withBucketName("test-bucket"); request.withBody(body); try { CreateLoadIkThesaurusResponse response = client.createLoadIkThesaurus(request); System.out.println(response.toString()); } catch (ConnectionException e) { e.printStackTrace(); } catch (RequestTimeoutException e) { e.printStackTrace(); } catch (ServiceResponseException e) { e.printStackTrace(); System.out.println(e.getHttpStatusCode()); System.out.println(e.getRequestId()); System.out.println(e.getErrorCode()); System.out.println(e.getErrorMsg()); } } } |
开启并配置词库信息。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
# coding: utf-8 import os from huaweicloudsdkcore.auth.credentials import BasicCredentials from huaweicloudsdkcss.v1.region.css_region import CssRegion from huaweicloudsdkcore.exceptions import exceptions from huaweicloudsdkcss.v1 import * if __name__ == "__main__": # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment ak = os.environ["CLOUD_SDK_AK"] sk = os.environ["CLOUD_SDK_SK"] projectId = "{project_id}" credentials = BasicCredentials(ak, sk, projectId) client = CssClient.new_builder() \ .with_credentials(credentials) \ .with_region(CssRegion.value_of("<YOUR REGION>")) \ .build() try: request = CreateLoadIkThesaurusRequest() request.cluster_id = "{cluster_id}" request.body = LoadCustomThesaurusReq( extra_stop_object="word/extraStop.txt", extra_main_object="word/extraMain.txt", static_stop_object="word/staticStop.txt", static_main_object="word/staticMain.txt", synonym_object="word/synonym.txt", stop_object="word/stop.txt", main_object="word/main.txt", bucket_name="test-bucket" ) response = client.create_load_ik_thesaurus(request) print(response) except exceptions.ClientRequestException as e: print(e.status_code) print(e.request_id) print(e.error_code) print(e.error_msg) |
开启并配置词库信息。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
package main import ( "fmt" "github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic" css "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1" "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1/model" region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/css/v1/region" ) func main() { // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment ak := os.Getenv("CLOUD_SDK_AK") sk := os.Getenv("CLOUD_SDK_SK") projectId := "{project_id}" auth := basic.NewCredentialsBuilder(). WithAk(ak). WithSk(sk). WithProjectId(projectId). Build() client := css.NewCssClient( css.CssClientBuilder(). WithRegion(region.ValueOf("<YOUR REGION>")). WithCredential(auth). Build()) request := &model.CreateLoadIkThesaurusRequest{} request.ClusterId = "{cluster_id}" extraStopObjectLoadCustomThesaurusReq:= "word/extraStop.txt" extraMainObjectLoadCustomThesaurusReq:= "word/extraMain.txt" staticStopObjectLoadCustomThesaurusReq:= "word/staticStop.txt" staticMainObjectLoadCustomThesaurusReq:= "word/staticMain.txt" synonymObjectLoadCustomThesaurusReq:= "word/synonym.txt" stopObjectLoadCustomThesaurusReq:= "word/stop.txt" mainObjectLoadCustomThesaurusReq:= "word/main.txt" request.Body = &model.LoadCustomThesaurusReq{ ExtraStopObject: &extraStopObjectLoadCustomThesaurusReq, ExtraMainObject: &extraMainObjectLoadCustomThesaurusReq, StaticStopObject: &staticStopObjectLoadCustomThesaurusReq, StaticMainObject: &staticMainObjectLoadCustomThesaurusReq, SynonymObject: &synonymObjectLoadCustomThesaurusReq, StopObject: &stopObjectLoadCustomThesaurusReq, MainObject: &mainObjectLoadCustomThesaurusReq, BucketName: "test-bucket", } response, err := client.CreateLoadIkThesaurus(request) if err == nil { fmt.Printf("%+v\n", response) } else { fmt.Println(err) } } |
更多编程语言的SDK代码示例,请参见API Explorer的代码示例页签,可生成自动对应的SDK代码示例。
状态码
状态码 |
描述 |
---|---|
200 |
请求已成功。 |
403 |
请求被拒绝访问。 返回该状态码,表明请求能够到达服务端,且服务端能够理解用户请求,但是拒绝做更多的事情,因为该请求被设置为拒绝访问,建议直接修改该请求,不要重试该请求。 |
500 |
表明服务端能被请求访问到,但是不能理解用户的请求。 |
错误码
请参见错误码。