Help Center/ Cloud Search Service_KooSearch/ API Reference/ API/ Outdated APIs/ Knowledge base management/ Modify Knowledge Base Configuration

Updated on 2025-08-13 GMT+08:00

View PDF

Modify Knowledge Base Configuration

Function

This API is used to modify knowledge base settings,

The options are as follows:

Parsing settings: whether to use OCR enhancement, and whether to parse images, the header and footer, and the contents page.

Document splitting settings: automatic segmentation, length segmentation (segmentation by text length), and level segmentation (segmentation by subheading, and subheading parsing rules can be customized).

Search model settings: Select a reranking model.
NLP model setting: Select a generative model.
Other settings: recall quantity, reranking switch, reference document quantity, intent classification, and query rewriting switch.

URI

PUT /v1/koosearch/repos/{repo_id}

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
repo_id	Yes	String	Knowledge base ID. The value is a string of 1 to 64 characters and can contain only digits, letters, hyphens (-), and underscores (_). How to obtain: Log in to the KooSearch experience platform. In the navigation tree on the left, choose Knowledge Bases to view knowledge base IDs. Each knowledge base has a unique ID stored in the vector database.

Request Parameters

**Table 2** Request header parameters
Parameter	Mandatory	Type	Description
X-Auth-Token	Yes	String	Parameter description: Token used for API authentication. For how to obtain the token, see section 3.2 "Authentication." Constraints: N/A.

**Table 3** Request body parameters
Parameter	Mandatory	Type	Description
top_k	No	Integer	top_k configuration. top_k indicates that the first k segments related to the query are recalled.
reference_count	No	Integer	The number of reference documents. A reference document is input to the NLP model with a query to generate the final answer.
rerank_enabled	No	Boolean	Indicates whether to enable the rerank function. If the function is enabled, the recalled top_k results are reordered using the rerank model. If the function is disabled, the recalled top_k results are not reordered.
query_rewrite_enabled	No	Boolean	Indicates whether to use the rewriting result for search.
search_plan_category_ids	No	Array of strings	Search plan category type default.category.list[0].id=talk default.category.list[0].category=Chit-chat default.category.list[0].locale=zh default.category.list[1].id=language_task default.category.list[1].category=Language task default.category.list[1].locale=zh default.category.list[2].id=human default.category.list[2].category=Characteristics default.category.list[2].locale=zh default.category.list[3].id=common default.category.list[3].category=General knowledge default.category.list[3].locale=zh default.category.list[4].id=special_knowledge default.category.list[4].category=Industry knowledge default.category.list[4].locale=zh
file_extract	No	FileExtract object	Overall configuration of document parsing, including the components used for document parsing and document splitting rules
rerank_model	No	String	Rerank model name
pangu_nlp_model	No	String	NLP large model name
search_threshold	No	Float	Search API filtering threshold. When result reranking is disabled, the threshold ranges from 0 to 200. When reranking is enabled, the threshold ranges from 0 to 1.
chat_ref_threshold	No	Float	Threshold for reference document filtering. When result reranking is disabled, the threshold ranges from 0 to 200. When reranking is enabled, the threshold ranges from 0 to 1.
faq_threshold	No	Float	Answer output threshold. Answers that exceed the threshold will be directly output without being summarized by large models. Notes: If the parameter value is less than or equal to 0, the answer is not directly output. In the earlier version of query2doc, the threshold ranges from 0 to 200 when reranking is disabled; and from 0 to 1 when reranking is enabled. In the new version query2query, the threshold is 0-1.
cache_enabled	No	Boolean	Whether to enable the cache.
session_config	No	SessionConfig object	Cache policy
answer_reference_enabled	No	Boolean	Whether to enable reference
answer_image_reference_enabled	No	Boolean	Whether to include both text and images.
extend_config	No	KnowledgeRepoExtendConfig object	Knowledge Base Extension Configuration
tags	No	Array of TagInfo objects	Tag list
refs	No	String	List of referenced knowledge base IDs, which are separated by commas (,).
name	No	String	Knowledge base name
search_plan_model	No	String	search_plan Model Name
prompt_info	No	KnowledgeRepoPromptInfo object	Prompt word information associated with the knowledge base
id	No	String	Knowledge base ID

**Table 4** FileExtract
Parameter	Mandatory	Type	Description
parse_conf	No	ParseConf object	Document parsing configuration, including whether to use OCR enhancement, whether to parse images, whether to extract text during image parsing, whether to parse the header and footer, and whether to parse the contents page.
split_conf	No	SplitConf object	Split configuration, including the segmentation mode, level parsing mode, title level depth, title saving mode, segment length, and title matching pattern.
id	No	String	Document parsing rule ID.

**Table 5** ParseConf
Parameter	Mandatory	Type	Description
ocr_enabled	No	Boolean	Parameter description: Whether the current knowledge base uses OCR enhancement. Pure Word documents do not need to be parsed using OCR. PDF and PPTX files require OCR for intelligent document recognition, such as table parsing and text extraction. Constraints: N/A. Default value: false
image_enabled	No	Boolean	Parameter description: Whether the current knowledge base needs to parse images. true: Skip images in the document by default. false: Parse images. The parsing mode is configured in image_conf. Constraints: N/A. Default value: false
header_footer_enabled	No	Boolean	Parameter description: Whether to parse the header and footer of the file in the current knowledge base. true: The parsing result contains the header and footer. false: The parsing result does not contain the header and footer. (If the header and footer do not contain key text information, you are advised to set this parameter to false to avoid interference.) Constraints: N/A Default value: false
catalog_enabled	No	Boolean	Parameter description: Indicates whether to parse the directory page of the file in the current knowledge base. false: The parsing result does not contain the directory page. (If there is no information that needs to be reserved on the content page, it is recommended that the default value be false.) Generally, a directory page contains a large number of keywords, which may affect the search result.) true: The parsing result contains the directory page. Constraints: N/A. Default value: false
image_conf	No	String	Parameter description: Image parsing mode when image parsing is enabled (image_enable is set to True). TEXT: Extracts text from an image and does not retain the image. IMAGE: The original image is retained. Constraints: If you want to return an answer with text and images, you must use the IMAGE mode and retain the original image. Default value: TEXT

**Table 6** SplitConf
Parameter	Mandatory	Type	Description
split_mode	No	String	Parameter description: Mode for splitting a document. Options: Four modes are available: AUTO: The system automatically identifies the document format and matches the appropriate splitting and parsing mode. LENGTH: Split by length. For example, each 500 characters are split into a paragraph. CATALOG: Automatic parsing in hierarchical segmentation. The system automatically identifies the hierarchical structure of an article and segments the article based on the hierarchical structure. For example, section 1.1.2 is a segment, and section 1.1.3 is a segment. RULE: Rule-based parsing in hierarchical segmentation. You can customize the matching rules of hierarchical titles and match and split chapters based on the customized rules. Constraints: N/A Default value: AUTO
separator_ids	No	Array of strings	Parameter description: ID list of segment identifiers in automatic segmentation and length segmentation modes. Segment identifier: determines the end character when a slice is segmented. Options: The specific value mapping is as follows: period_zh: Chinese period. period_en: English period. exclamation_mark_zh: Chinese exclamation mark (!) exclamation_mark_en: English exclamation mark (!) question_mark_zh: Chinese question mark (?) question_mark_en: English question mark (?) comma_zh: Chinese comma (,) comma_en: English comma (,) space_en: space Constraints: N/A. Default value: ["period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en"]
rule_regex_id	No	String	Parameter description: ID of the selected user-defined parsing rule. Constraints: N/A.
chunk_size	No	Integer	Parameter description: Maximum length of a document segment. A document is segmented based on the maximum length. Constraints: N/A. Default value: 500
title_level	No	Integer	Parameter description: Depth of the title level reserved for a segment. For example: If the depth is 3, the current paragraph is 1.1.3, and the parent titles 1.1 and 1 are retained. If the depth is 2, the current paragraph is 1.1.3, the parent title 1.1 is retained, and the parent title 1 is discarded. Constraints: N/A. Default value: 3
combine_title	No	Boolean	Parameter description: Whether to retain the hierarchical title combination. The options are as follows: false: Only the last-level title is retained. true: Save the combination of multiple levels of titles, from the first level to the last level. For example, 1.1 indicates the usage description, and 1.1.1 indicates how to open the refrigerator. Constraints: N/A. Default value: false
merge_titles	No	Boolean	Parameter description: Whether to merge titles. The options are as follows: true: If the text in a single paragraph of different titles is small, the paragraphs are automatically merged into the specified segment length to generate more comprehensive results. For example, if the two adjacent sub-paragraphs are less than 200 characters and the expected segment length is 500, the two paragraphs are combined into one paragraph. false: Paragraphs with different titles are not merged. Constraints: N/A. Default value: true

**Table 7** SessionConfig
Parameter	Mandatory	Type	Description
similarity_threshold	Yes	Float	Parameter description: Threshold of the query2query similarity for matching cached questions. Options: 0.1 to 1.0. A higher threshold indicates a higher similarity between the query and the cached question. Constraints: N/A.
answer_select_policy	Yes	String	Parameter description: Cache hit selection policy. Options: FIRST: Select the result with the highest score as the answer. RANDOM: Randomly select a result as the answer. Constraints: N/A.
eviction	Yes	Eviction object	Cache expiration policy.
model_name	Yes	String	Parameter description: Name of the query2query model used when the cache is hit. This parameter is used to calculate the similarity between the new query and the cached query. Constraints: N/A.

**Table 8** Eviction
Parameter	Mandatory	Type	Description
policy	Yes	String	Parameter description: Declares which expiration policy is used by the cache. Options: LRU: (Least Recently Used) now - accessTime > ttl, clear. FIFO: (First In First Out) now - createTime > ttl, clear. LFU: (Least Frequency Used) hit_count < threshold, clear. Constraints: N/A.
ttl	No	Long	Parameter description: Cache expiration time. When the cache exceeds the specified time, the cache is cleared. The unit is millisecond. Constraints: N/A.
hit_count_threshold	No	Long	Parameter description: Cache hit threshold. When the number of cache hits reaches the threshold, the cache result is not used. Constraints: N/A.

**Table 9** KnowledgeRepoExtendConfig
Parameter	Mandatory	Type	Description
extend_context	No	Boolean	Parameter description: Specifies whether to extend the long context of the reference shard. Provides a wider context to provide the model with complete answers. Constraints: N/A.
effective_input_length	No	Integer	Parameter description: Specifies the length of the selected context when the extended context is enabled. This parameter is related to the model and ensures the valid length of the input token to ensure the optimal output. Constraints: For multi-round dialogs, it is recommended that the value be 60/ %(rounded up) of the model context length. Options: 2 to 128, in KB
top_p	No	Float	Parameter description: An alternative to temperature sampling, called nucleus sampling, which controls the diversity of generated text by limiting the range of vocabulary choices. A higher top_p value means a wider choice of tokens and hence a higher text diversity. Constraints: You are advised to change the value of top_p or temperature to adjust the generating text tendency. Do not change both two parameters. Options: 0.1 ~ 1 Default value: 0.1
max_tokens	No	Integer	Parameter description: Specifies the maximum number of new words generated by the model. Constraints: The value of max_tokens is related to the maximum context length supported by the model. max_tokens must be less than the maximum context length supported by the model minus the length of the tokens input to the model. Options: 1 ~ 131072 Default value: 131072
chat_temperature	No	Float	Parameter description: Diversity and creativity of text generated by the model in non-search enhancement scenarios. A value close to 0 indicates the lowest randomness, while 1 indicates the highest randomness. Generally, a lower temperature is suitable for deterministic tasks, higher values favor creative tasks. Constraints: N/A. Options: 0 ~ 1.
search_temperature	No	Float	Parameter description: Diversity and creativity of text generated by the model in search enhancement scenarios. A value close to 0 indicates the lowest randomness, while 1 indicates the highest randomness. Generally, a lower temperature is suitable for deterministic tasks, higher values favor creative tasks. Constraints: N/A. Options: 0 to 1. Generally, the value is set to 0.2 or 0.3 for the Pangu NLP model. Default value: 0.3
presence_penalty	No	Float	Parameter description: Degree of duplication in the generated text. The purpose of presence_penalty is to reduce the repeated use of the same or similar content when the model generates text, so as to improve the diversity of the generated text. If a token has appeared in the previous text, the model will be penalized when generating this token. A smaller presence_penalty indicates that the model considers fewer previously generated tokens, which may result in repeated content in the text. A larger value of presence_penalty indicates that the model tends to generate new tokens that have not appeared before, and the generated text is more diversified. Options: The value ranges from –2 to 2. The actual value needs to be determined depending on the situation. Generally, the value 1.1 is used for the Pangu NLP model. Default value: 0
use_system_prompt	No	Boolean	Parameter description: Whether to use the system prompt. The prompt standard combination scheme in the RAG scenario of the Pangu NLP model is used. Constraints: When the Pangu NLP model is used, the system prompt can be used in common scenarios. Default value: false
system_prompt	No	String	Parameter description: System prompt. Constraints: This parameter is mandatory when use_system_prompt is set to true. You do not need to combine the query.
embedding_search_enable	No	Boolean	Parameter description: Specifies whether to enable vector retrieval when related documents are retrieved. Constraints: N/A. Default value: true
keyword_search_enable	No	Boolean	Parameter description: Specifies whether to enable keyword retrieval when related documents are retrieved. Constraints: N/A. Default value: false
keyword_top_k	No	Integer	Parameter description: Specifies the number of top results returned when keyword retrieval is used. Constraints: N/A. Options: 0 ~ 100 Default value: 10
refuse_enable	No	Boolean	Parameter description: If no related reference document content is found, determines whether to disable model invoking and directly reject the answer on the platform. Constraints: N/A. Default value: false
refuse_answer	No	String	Parameter description: When refuse_enabled is set to true, if no related reference document content is found, the platform rejects the configured wording. Constraints: N/A.
image_match_type	No	String	Parameter description: Specifies the image recall mode in the image and text recall scenario. Options: Currently, only context_match and reference_match are supported. context_match: Only semantically related images are recalled. If the context of the image in the reference paragraph is semantically similar to the generated paragraph, the image is recalled. Otherwise, the image is not recalled. reference_match: All images in the reference paragraph are recalled. Constraints: N/A. Default value: context_match
custom_types	No	Map<String,String>	Parameter description: Mapping type dictionary of a custom field, which applies to structured data scenarios and specifies queryable fields. Example: {"companyName": "keyword"} companyName: field to be retrieved; keyword: retrieval mode Constraints: The value in the mapping dictionary must be of a type supported by Elasticsearch queries, for example, keyword, integer, or text.

**Table 10** TagInfo
Parameter	Mandatory	Type	Description
tag_key	Yes	String	Parameter description: Tag key. Options: The value can contain 1 to 36 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Constraints: N/A.
tag_value	Yes	String	Parameter description: Tag value. Options: The value can contain 0 to 43 characters. It can only contain digits, letters, hyphens (-), and underscores (_). Constraints: N/A.

**Table 11** KnowledgeRepoPromptInfo
Parameter	Mandatory	Type	Description
prompt_id	No	String	Parameter description: ID of the NLP model prompt used by the current knowledge base. Constraints: The value of prompt_id must be an existing prompt_id in the prompt management.
qa_question_prompt_id	No	String	Parameter description: ID of the QA question generation prompt used by the current knowledge base. This prompt is used to automatically generate QA pairs using documents. Constraints: The value of prompt_id must be an existing prompt_id in the prompt management.
qa_answer_prompt_id	No	String	Parameter description: ID of the QA answer generation prompt used by the current knowledge base. This prompt is used to automatically generate QA pairs using documents. Constraints: The value of prompt_id must be an existing prompt_id in the prompt management.

Response Parameters

Status code: 200

**Table 12** Response body parameters
Parameter	Type	Description
repo_id	String	Parameter description: Specifies the current knowledge base ID. Constraints: N/A

Status code: 400

**Table 13** Response body parameters
Parameter	Type	Description
error_code	String	Error Code
error_msg	String	Error description

Status code: 500

**Table 14** Response body parameters
Parameter	Type	Description
error_code	String	Error Code
error_msg	String	Error description

Example Requests

PUT https://{endpoint}/v1/koosearch/repos/acd90739-2e22-4870-b2db-35018699b623

{
  "id" : "acd90739-2e22-4870-b2db-35018699b623",
  "name" : "Knowledge base A",
  "tags" : [ ],
  "top_k" : 50,
  "rerank_enabled" : true,
  "query_rewrite_enabled" : true,
  "reference_count" : 3,
  "search_threshold" : 0,
  "rerank_model" : "pangu_rerank",
  "pangu_nlp_model" : "KooSearch-N1",
  "search_plan_model" : "search-plan",
  "file_extract" : {
    "parse_conf" : {
      "ocr_enabled" : true,
      "image_enabled" : true,
      "header_footer_enabled" : true,
      "catalog_enabled" : false,
      "image_conf" : "IMAGE"
    },
    "split_conf" : {
      "split_mode" : "AUTO"
    }
  },
  "search_plan_category_ids" : [ ],
  "cache_enabled" : false,
  "answer_reference_enabled" : false,
  "answer_image_reference_enabled" : false,
  "chat_ref_threshold" : 0,
  "faq_threshold" : 0.95,
  "extend_config" : {
    "extend_context" : false,
    "effective_input_length" : 5,
    "top_p" : 0.1,
    "max_tokens" : 2048,
    "chat_temperature" : 0.8,
    "search_temperature" : 0.3,
    "presence_penalty" : 0,
    "use_system_prompt" : false,
    "system_prompt" : "You need to reply to the user request based on the dialog history and given document and comply with the following principles: 1. Strictly comply with the terms and description logic of the document; 2. If a document segment is used in the reply, use [No.] to add a reference in the corresponding position. 3. If the user request cannot be replied to based on the dialog history and given document, or the user question involves security sensitive information, directly reply [The existing information cannot be used to reply to your request].\nBasic document information:\n\n<#list docs as doc>\n[${doc?counter}] Document title: ${doc.title!}\nA piece of the document: ${doc.content!}\n</#list>",
    "refuse_enable" : false,
    "image_match_type" : "context_match",
    "refuse_answer" : ""
  },
  "prompt_info" : {
    "prompt_id" : "default_chat_prompt",
    "qa_answer_prompt_id" : "default_qa_answer_prompt",
    "qa_question_prompt_id" : "default_qa_question_prompt"
  }
}

Example Responses

Status code: 200

Knowledge base ID.

{
  "repo_id" : "acd90739-2e22-4870-b2db-35018699b623"
}

Status Codes

Status Code	Description
200	Knowledge base ID.
400	Incorrect request body parameter
500	Internal error

Error Codes

See Error Codes.

Parent topic: Knowledge base management

Previous topic: This API is used to create a knowledge base.

Next topic: Setting the Q&A Prompt for Knowledge Base Search

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot