Modifying Knowledge Base Configuration

Function

This API is used to modify knowledge base settings,

including:

Parsing settings: whether to use OCR enhancement, and whether to parse images, the header and footer, and the contents page.
Document splitting settings: automatic segmentation, segmentation by text length, and segmentation by subheading, where subheading parsing rules can be customized.
Search model settings: Select a reranking model.
NLP model setting: Select a generative model.
Other settings: recall quantity, reranking switch, reference document quantity, intent classification, and query rewriting switch.

URI

PUT /v1/{project_id}/applications/{application_id}/uni-search/knowledge-repo/{repo_id}

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
project_id	Yes	String	Definition: Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. Constraints: N/A Value range: The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter. Default value: N/A
application_id	Yes	String	Definition: Application ID. For details about how to obtain the application ID, see Obtaining an Application ID. Constraints: Character string Value range: The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter. Default value: N/A
repo_id	Yes	String	Definition: Knowledge base ID. How to obtain: Log in to the KooSearch experience platform. In the navigation tree on the left, choose Knowledge Bases to view knowledge base IDs. Each knowledge base has a unique ID stored in the vector database. Constraints: N/A Value range: Length: 1 to 64 characters. The value can contain only digits, letters, hyphens (-), and underscores (_). Default value: N/A

Request Parameters

**Table 2** Request header parameters
Parameter	Mandatory	Type	Description
X-Auth-Token	Yes	String	Definition: Token used for API authentication. For details about how to obtain the token, see Obtaining an IAM User Token. Constraints: N/A Value range: N/A Default value: N/A

**Table 3** Request body parameters
Parameter	Mandatory	Type	Description
id	No	String	Definition: Knowledge base ID. Constraints: N/A Value range: 1 to 64 characters. Default value: N/A
name	No	String	Definition: Knowledge base name. Constraints: N/A Value range: The value can contain a maximum of 64 characters. It must start with a letter or digit and can contain letters, digits, and underscores (_). Default value: N/A
top_k	No	Integer	Definition: top_k configuration. top_k indicates that the first k chunks relevant to the query are recalled. Constraints: N/A Value range: 10-500 Default value: N/A
reference_count	No	Integer	Definition: The number of reference documents. Number of reference documents provided as input to the NLP model along with the query to generate the final answer. Constraints: N/A Value range: 1-50 Default value: N/A
rerank_enabled	No	Boolean	Definition: Whether to enable result reranking. When enabled, the recalled top_k results are reranked using the reranking model. When disabled, the recalled top_k results are not reranked. Constraints: N/A Value range: true: Enable reranking. The top_k results recalled will be reranked by the reranking model. false: Disable reranking. The top_k results recalled will not be reranked. Default value: N/A
query_rewrite_enabled	No	Boolean	Definition: Whether to use the rewriting result for search. Constraints: N/A Value range: true: Use the rewriting result for search. false: Do not use the rewriting result for search. Default value: N/A
search_plan_category_ids	No	Array of strings	Definition: Search planning categories. The list can contain a maximum of 10 elements. Each element can contain a maximum of 64 characters. Constraints: N/A Value range: The list length cannot exceed 10. Values: professional_knowledge-medical weather professional_knowledge professional_knowledge-manufacturing chitchat language_tasks professional_knowledge-finance general_knowledge professional_knowledge-government identity: persona Default value: N/A
file_extract	No	FileExtract object	Definition: Overall configuration of document parsing, including the components used for document parsing and document splitting rules. Constraints: N/A Value range: N/A Default value: N/A
rerank_model	No	String	Definition: Reranking model name. Constraints: N/A Value range: The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit. Default value: N/A
search_plan_model	No	String	Definition: The name of the search planning model. Constraints: N/A Value range: The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit. Default value: N/A
pangu_nlp_model	No	String	Definition: NLP model name. Constraints: N/A Value range: The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit. Default value: N/A
search_threshold	No	Float	Definition: Threshold for filtering search interfaces. Constraints: If reranking is disabled, the threshold ranges from 0 to 200. If reranking is enabled, the threshold ranges from 0 to 1. Value range: 0-200 Default value: N/A
chat_ref_threshold	No	Float	Definition: Reference document filtering threshold. Constraints: If reranking is disabled, the threshold ranges from 0 to 200. If reranking is enabled, the threshold ranges from 0 to 1. Value range: 0-200 Default value: N/A
faq_threshold	No	Float	Definition: FAQs with a correlation score exceeding this threshold will have their answers directly output, without needing to be summarized by the big model. Constraints: If the parameter value is less than or equal to 0, the answer is not generated directly based on preset FAQs. In the earlier version of query2doc, the threshold ranges from 0 to 200 when reranking is disabled; and from 0 to 1 when reranking is enabled. For query2query of the new version, the threshold ranges from 0 to 1. Value range: 0-200 Default value: N/A
cache_enabled	No	Boolean	Definition: Whether to enable caching. Constraints: N/A Value range: true: Enable caching. false: Disable caching. Default value: false
session_config	No	SessionConfig object	Definition: Cache policy. Constraints: N/A Value range: N/A Default value: N/A
answer_reference_enabled	No	Boolean	Definition: Whether to enable referencing. Constraints: N/A Value range: true: Enable referencing. false: Disable referencing. Default value: false
answer_image_reference_enabled	No	Boolean	Definition: Whether to include both text and images. Constraints: N/A Value range: true: Include both text and images. false: Do not include both text and images. Default value: false
refs	No	String	Definition: List of referenced knowledge base IDs, which are separated by commas (,). Constraints: N/A Value range: The value contains a maximum of 1024 characters. Default value: N/A
tags	No	Array of TagInfo objects	Definition: Tag list. Constraints: N/A Value range: N/A Default value: N/A
extend_config	No	KnowledgeRepoExtendConfig object	Definition: Knowledge base extension configuration. Constraints: N/A Value range: N/A Default value: N/A
prompt_info	No	KnowledgeRepoPromptInfo object	Definition: Prompts associated with the knowledge base. Constraints: N/A Value range: N/A Default value: N/A
table_rag_enabled	No	Boolean	Definition: Whether to enable tableRAG. Constraints: N/A Value range: true: Enable tableRAG. false: Disable tableRAG. Default value: false

**Table 4** FileExtract
Parameter	Mandatory	Type	Description
parse_conf	No	ParseConfReq object	Definition: Document parsing configuration, including whether to use OCR enhancement, whether to parse images, whether to extract text during image parsing, whether to parse the header and footer, and whether to parse the contents page. Constraints: N/A Value range: N/A Default value: N/A
split_conf	No	SplitConf object	Definition: Split configuration, including the segmentation mode, level parsing mode, title level depth, title saving mode, segment length, and title matching pattern. Constraints: N/A Value range: N/A Default value: N/A

**Table 5** ParseConfReq
Parameter	Mandatory	Type	Description
ocr_enabled	No	Boolean	Definition: OCR enhancement. Constraints: N/A Value range: true: Enable OCR enhancement. false: Do not enable OCR enhancement. Default value: false
mllm_enabled	No	Boolean	Definition: Multimodal enhancement. Constraints: N/A Value range: true: Enable multi-modal enhancement. false: Disable multi-modal enhancement. Default value: false
mllm_model	No	String	Definition: Multimodal model name. Constraints: The mllm_plan model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit. Default value: N/A
image_enabled	No	Boolean	Definition: Image parsing. Constraints: N/A Value range: true: Parse images. false: Do not parse images. Default value: false
header_footer_enabled	No	Boolean	Definition: Parse the header and footer. Constraints: N/A Value range: true: Parse the header and footer. false: Ignore the header and footer. Default value: false
catalog_enabled	No	Boolean	Definition: Parse the contents page. Constraints: N/A Value range: true: Parse the contents page. false: Ignore the contents page. Default value: false
image_conf	No	String	Definition: Image parsing mode used when image parsing is enabled (image_enable is set to True). Constraints: To return answers with images, you must use the IMAGE mode and retain original images. Value range: TEXT: Extracts text from the image. IMAGE: Retains the original image. IMAGE_TEXT: Extracts the text and retains the original image. Default value: TEXT
footnote_enabled	No	Boolean	Definition: Parse footnotes. Constraints: N/A Value range: true: Parse footnotes. false: Do not parse footnotes. Default value: false

**Table 6** SplitConf
Parameter	Mandatory	Type	Description
split_mode	No	String	Definition: Document segmentation mode. Value range: The value can be: AUTO: The system automatically identifies the document format and matches the appropriate segmentation and parsing mode. LENGTH: Segments a document by length. For example, a document is segmented into paragraphs every 500 characters. CATALOG: Automatic parsing under hierarchical segmentation. The system automatically identifies the hierarchical structure of an article and segments the article based on the hierarchical structure. For example, section 1.1.2 is a segment, and section 1.1.3 is a segment. RULE: Rule-based parsing under hierarchical segmentation. You can customize the matching rules of hierarchical titles and match and split chapters based on custom rules. Constraints: N/A Default value: AUTO
separator_ids	No	Array of strings	Definition: The chunk ID list in automatic segmentation and length segmentation modes. Chunk ID: determines the end character for each chunk. Constraints: N/A Value range: Value mapping: period_zh: "Chinese period ", period_en: "English period .", exclamation_mark_zh: "Chinese exclamation mark ", exclamation_mark_en: "English exclamation mark !", question_mark_zh: "Chinese question mark ", question_mark_en: "English question mark ?", question_mark_ar: "Arabic question mark ؟", comma_zh: "Chinese comma", comma_en: "English comma ,", space_en: "Space" Default value: {"period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en"}
rule_regex_id	No	String	Definition: User-defined Parsing Rule ID Constraints: N/A Value range: N/A Default value: N/A
chunk_size	No	Integer	Definition: Maximum length of a document chunk. The document is segmented based on the maximum chunk length. Constraints: N/A Value range: 0-6000 Default value: 500
title_level	No	Integer	Definition: Title hierarchy depth retained in a chunk. For example: If the depth is 3 and the current paragraph is 1.1.3, then the parent titles 1.1 and 1 are both retained. If the depth is 2 and the current paragraph is 1.1.3, then the parent title 1.1 is retained, and the parent title 1 is discarded. Constraints: N/A Value range: 1-10 Default value: 3
combine_title	No	Boolean	Definition: Whether to retain the hierarchical title combination. Constraints: N/A Value range: true: Retain the hierarchical title combination. false: Do not retain the hierarchical title combination. Default value: false
merge_titles	No	Boolean	Definition: Cross-Title Merge: When text in paragraphs with different titles is limited, it is automatically merged up to a specified section length, aiding in the creation of a more comprehensive outcome. Constraints: N/A Value range: true: Merge across titles. false: Do not merge across titles. Default value: false
rule_regexs	No	Array of strings	Definition: User-defined parsing rules. Constraints: N/A Value range: The list length ranges from 1 to 100. Default value: N/A
merge_last_chunk	No	Boolean	Definition: Whether to merge the most recently modified segments. Constraints: N/A Value range: true: Merge the most recently modified segments. false: Do not merge the most recently modified segments. Default value: N/A

**Table 7** SessionConfig
Parameter	Mandatory	Type	Description
similarity_threshold	Yes	Float	Definition: Query2query similarity threshold for cache hit. A higher threshold indicates a higher similarity between the new query and the cached query is required. Constraints: N/A Options: 0.1 ~ 1.0 Default value: 0.9
answer_select_policy	Yes	String	Definition: Cache hit selection policy. Constraints: N/A Value range: The value can be: FIRST: Select the answer with the highest score from the matched results. RANDOM: Randomly select an answer from the matched results. Default value: N/A
eviction	Yes	Eviction object	Definition: Cache expiration policy. Constraints: N/A Value range: N/A Default value: N/A
model_name	Yes	String	Definition: Name of the query2query model used to calculate the similarity between the new query and the cached query when there is a hit. Constraints: N/A Value range: 1 to 64 characters. Default value: N/A

**Table 8** Eviction
Parameter	Mandatory	Type	Description
policy	Yes	String	Definition: Cache expiration policy. Constraints: N/A Value range: The value can be: LRU: (Least Recently Used). If Now − accessTime > TTL, clear the item. FIFO: (First In First Out). If now − createTime > TTL, clear the item. LFU: (Least Frequency Used). If hit_count < threshold, clear the item. Default value: N/A
ttl	No	Long	Definition: Cache expiration time, in milliseconds. Constraints: N/A Value range: 0-31536000000 Default value: N/A
hit_count_threshold	No	Long	Definition: Threshold of cache hits. Constraints: N/A Value range: 1-10000 Default value: N/A

**Table 9** TagInfo
Parameter	Mandatory	Type	Description
tag_key	Yes	String	Definition: Knowledge base tag keyword. Constraints: N/A Value range: 1 to 128 characters. Default value: N/A
tag_value	Yes	String	Definition: Knowledge base tag information. Constraints: N/A Value range: 1 to 128 characters. Default value: N/A

**Table 10** KnowledgeRepoExtendConfig
Parameter	Mandatory	Type	Description
extend_context	No	Boolean	Definition: Extend the context length to generate more comprehensive responses, for example: Table small-to-big Association of text (1)(2)(3) Document summary Constraints: N/A Value range: true: Extend the context length. false: Do not extend the context. Default value: false
effective_input_length	No	Integer	Definition: Optimal context length, which varies with different models. Set the valid length of input tokens to ensure optimal output. In consideration of multi-turn dialogues, we recommend setting this length to 60/ %(rounded up) of the maximum context length supported by the model. Constraints: N/A Value range: 2-256 Default value: 32
top_p	No	Float	Definition: An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. Constraints: N/A Value range: 0.1-1 Default value: 0.1
max_tokens	No	Integer	Definition: Maximum number of tokens in the generated text. The total length of the input text plus the generated text cannot exceed the maximum length that the model can process. Constraints: N/A Value range: 1-262144 Default value: 2048
chat_temperature	No	Float	Definition: Diversity of non-RAG model's output. Constraints: N/A Value range: N/A Default value: 0-1
search_temperature	No	Float	Definition: Diversity of the RAG model's output. Constraints: N/A Value range: 0-1 Default value: 0.6
presence_penalty	No	Float	Definition: Text repetition penalty. Constraints: N/A Value range: -2 - 2 Default value: 0
use_system_prompt	No	Boolean	Definition: Whether to use system prompts. Keep consistent with the standard prompt assembly solution used by Pangu RAG. Constraints: N/A Value range: true: Use system prompts. false: Do not use system prompts. Default value: false
system_prompt	No	String	Definition: System prompt. Note: This parameter is mandatory when use_system_prompt is set to true. No need to combine the query. Constraints: N/A Value range: 0-8192 Default value: N/A
qa_question_prompt	No	String	Definition: QA generation and question generation prompt. Constraints: N/A Value range: 0-8192 Default value: N/A
qa_answer_prompt	No	String	Definition: QA generation and answer generation prompt. Constraints: N/A Value range: N/A Default value: 0-8192
refuse_enable	No	Boolean	Definition: Whether to reject certain questions. Constraints: N/A Value range: true: Enable answer rejection. false: Disable answer rejection. Default value: false
refuse_answer	No	String	Definition: Rejection answer. Constraints: N/A Value range: 1-8192 Default value: N/A
image_match_type	No	String	Definition: Image description parameter. The options are context_match, reference_match, and model_match. The default value is context_match. Constraints: N/A Value range: context_match reference_match model_match Default value: context_match
custom_types	No	Map<String,Map<String,String>>	Definition: Custom structure type. Constraints: N/A Value range: N/A Default value: N/A
directory_enable	No	Boolean	Definition: Whether to enable directory management. Constraints: N/A Value range: true: Enable directory management. false: Disable directory management. Default value: false
embedding_search_enable	No	Boolean	Definition: Whether to enable vector search. Constraints: N/A Value range: true: Enable vector search. false: Disable vector search. Default value: true
keyword_search_enable	No	Boolean	Definition: Whether to enable keyword-based search. Constraints: N/A Value range: true: Enable keyword-based search. false: Disable keyword-based search. Default value: false
keyword_top_k	No	Integer	Definition: Top-k for keyword-based search. The value ranges from 0 to 100. The default value is 10. Constraints: N/A Value range: 0-100 Default value: 10
search_engine_type	No	String	Definition: Search engine type. Constraints: N/A Value range: The value can be: search_engine: web search engine. ai_engine: enhanced web search service. Default value: N/A
search_engine_name	No	String	Definition: Search engine name. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A
think_model_name	No	String	Definition: Name of the deep thinking model. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A
faq_top_k	No	Integer	Definition: Top-k for Q&A and hybrid search where results are not directly from preset FAQs. Constraints: N/A Value range: 0-50 Default value: 2
faq_similarity_threshold	No	Float	Definition: Threshold for Q&A and hybrid search where results are not directly from preset FAQs. Constraints: N/A Value range: 0-1 Default value: 0.8
extract_model_name	No	String	Definition: Graph extraction model name. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A
optimize_model_name	No	String	Definition: Name of the graph optimization model Constraints: N/A Value range: The value cannot exceed 64 characters. Default value: N/A
graph_search_enable	No	Boolean	Definition: Whether to enable graph search. Constraints: N/A Value range: true: Enable graph search. false: Disable graph search. Default value: false
graph_reference_count	No	Integer	Definition: Number of graph search reference documents. This parameter takes effect when graph search is enabled. Constraints: N/A Value range: 1-50 Default value: 10
graph_top_k	No	Integer	Definition: Top-k for graph vector recall. Constraints: N/A Value range: 1-500 Default value: 50
graph_keyword_top_k	No	Integer	Definition: Top-k for keyword-based graph search. Constraints: N/A Value range: 1-100 Default value: 20
graph_threshold	No	Float	Definition: Graph re-ranking threshold. Constraints: N/A Value range: 0-200 Default value: 0.3
number_of_shards	No	Integer	Definition: Number of knowledge base index shards. Constraints: N/A Value range: 1-1024 Default value: 3
number_of_replicas	No	Integer	Definition: Number of knowledge base index replicas. Constraints: N/A Value range: 0-3 Default value: 1

**Table 11** KnowledgeRepoPromptInfo
Parameter	Mandatory	Type	Description
prompt_id	No	String	Definition: Prompt ID. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A
qa_question_prompt_id	No	String	Definition: QA question generation prompt ID. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A
qa_answer_prompt_id	No	String	Definition: QA answer generation prompt ID Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A
mllm_prompt_id	No	String	Definition: ID of the mllm prompt. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A
table_rag_config	No	String	Definition: Prompts related to tabular enhancement, including: chat_prompt_with_sqlresults_id: Q&A prompts for tabular enhancement nl2sql_prompt_id: prompt for generating SQL statements table_rag_prompt_id: tabular Q&A prompt Constraints: N/A Value range: 1 to 512 characters. Default value: N/A

Response Parameters

Status code: 200

**Table 12** Response body parameters
Parameter	Type	Description
repo_id	String	Definition: Knowledge base ID. Value range: N/A

Status code: 400

**Table 13** Response body parameters
Parameter	Type	Description
error_code	String	Definition: Error Code. Value range: N/A
error_msg	String	Definition: Error message. Value range: N/A

Status code: 500

**Table 14** Response body parameters
Parameter	Type	Description
error_code	String	Definition: Error Code. Value range: N/A
error_msg	String	Definition: Error message. Value range: N/A

Example Requests

This API is used to modify knowledge base settings.

/v1/1ed40ceefc8d40f8b884edb6a84e7768/applications/fb9731ab-7085-474f-b6c7-64473586f0f3/uni-search/knowledge-repo/5bb86225-e2ea-4404-8125-aaa3b79419ad

{
  "id" : "5bb86225-e2ea-4404-8125-aaa3b79419ad",
  "name" : "test_20250425",
  "tags" : [ ],
  "top_k" : 50,
  "rerank_enabled" : true,
  "query_rewrite_enabled" : true,
  "reference_count" : 3,
  "search_threshold" : 0,
  "rerank_model" : "rerank-zh",
  "pangu_nlp_model" : "dp-r1",
  "search_plan_model" : "search_ai_plan",
  "file_extract" : {
    "parse_conf" : {
      "ocr_enabled" : true,
      "image_enabled" : true,
      "header_footer_enabled" : false,
      "catalog_enabled" : false,
      "image_conf" : "TEXT"
    },
    "split_conf" : {
      "split_mode" : "LENGTH",
      "separator_ids" : [ "period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en" ],
      "chunk_size" : 500
    }
  },
  "search_plan_category_ids" : [ ],
  "cache_enabled" : true,
  "session_config" : {
    "model_name" : "embedding-zh_faq",
    "similarity_threshold" : 0.9,
    "answer_select_policy" : "first",
    "eviction" : {
      "policy" : "lru",
      "hit_count_threshold" : 1,
      "ttl" : 86400000
    }
  },
  "answer_reference_enabled" : false,
  "answer_image_reference_enabled" : false,
  "chat_ref_threshold" : 0,
  "faq_threshold" : 0.95,
  "extend_config" : {
    "extend_context" : false,
    "effective_input_length" : 3,
    "top_p" : 0.1,
    "max_tokens" : 2048,
    "chat_temperature" : 0.6,
    "search_temperature" : 0.6,
    "presence_penalty" : 0,
    "search_engine_name" : "bocha",
    "think_model_name" : "dp-r1",
    "refuse_enable" : false,
    "image_match_type" : "context_match",
    "directory_enable" : false,
    "embedding_search_enable" : true,
    "keyword_search_enable" : false,
    "keyword_top_k" : 10,
    "faq_top_k" : 2,
    "faq_similarity_threshold" : 0.8,
    "refuse_answer" : ""
  },
  "prompt_info" : {
    "prompt_id" : "default_chat_prompt",
    "qa_answer_prompt_id" : "default_qa_answer_prompt",
    "qa_question_prompt_id" : "default_qa_question_prompt"
  }
}