Creating a Knowledge Base

Function

Create a knowledge base.

URI

POST /v1/{project_id}/applications/{application_id}/uni-search/knowledge-repo

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
project_id	Yes	String	Definition: Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. Constraints: N/A Value range: The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter. Default value: N/A
application_id	Yes	String	Definition: Application ID. For details about how to obtain the application ID, see Obtaining an Application ID. Constraints: Character string Value range: The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter. Default value: N/A

Request Parameters

**Table 2** Request header parameters
Parameter	Mandatory	Type	Description
X-Auth-Token	Yes	String	Definition: Token used for API authentication. For details about how to obtain the token, see Obtaining an IAM User Token. Constraints: N/A Value range: N/A Default value: N/A

**Table 3** Request body parameters
Parameter	Mandatory	Type	Description
name	Yes	String	Definition: Knowledge base name. Constraints: N/A Value range: The value can contain 1 to 64 characters. It can only contain letters, digits, hyphens (-), and underscores (_), and must start with a letter or digit. Default value: N/A
detail	No	String	Definition: Description of the knowledge base. Constraints: N/A Value range: The length cannot exceed 100 characters. Default value: N/A
search_plan_category_ids	No	Array of strings	Definition: Search planning categories. If there is no hit on category, the knowledge base is searched and then an LLM summarizes the results. If a category is matched, the LLM answers the question directly. Constraints: N/A Value range: The value can be: professional_knowledge-medical weather professional_knowledge professional_knowledge-manufacturing chitchat language_tasks professional_knowledge-finance general_knowledge professional_knowledge-government identity: persona Default value: N/A
file_extract	No	FileExtractConf object	Definition: Overall configuration of document parsing. Constraints: N/A Value range: N/A Default value: N/A
cache_enabled	No	Boolean	Definition: Whether to enable cache for the current knowledge base. Constraints: N/A Value range: false: The result is not cached. true: Queries, references, and answers are cached. If a similar query is encountered subsequently, the cached result is directly returned, reducing the query latency. Default value: false
answer_reference_enabled	No	Boolean	Definition: Whether to enable reference source tracing in the current knowledge base. When enabled, the sources of the generated answers will be located. Constraints: N/A Value range: true: reference source tracing false: no reference source tracing Default value: false
answer_image_reference_enabled	No	Boolean	Definition: Whether to include both text and images. Constraints: Answer reference must be enabled (answer_reference_enabled). Value range: true: If a document contains images related to a question, the answer must include both text and images. false: Not to recall images. Default value: false
session_config	No	SessionConfig object	Definition: Cache policy. Constraints: N/A Value range: N/A Default value: N/A
embedding_model	No	String	Definition: Name of the embedding model used by the current knowledge base. Embedding model: used to vectorize text content for vector search. Constraints: The embedding model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-). Default value: N/A
rerank_model	No	String	Definition: Name of the reranking model used by the current knowledge base. Reranking model: Reranks the top_k results returned by the embedding model through vector search. The purpose is to provide results that are most relevant to the query. Constraints: The reranking model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-). Default value: N/A
pangu_nlp_model	No	String	Definition: Name of the NLP foundation model used by the current knowledge base. NLP model: a generative model used by the chat API to generate answers based on the input text. Constraints: The reranking model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-). Default value: N/A
search_plan_model	No	String	Definition: Name of the search planning model. search_plan model: search planning model. It replans user queries, including query rewriting and query completion. Constraints: The search planning model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit. Default value: N/A
language_id	No	String	Definition: Language ID of the current knowledge base. Constraints: N/A Value range: zh==CHINESE en==ENGLISH th==THAI ar==ARABIC es==SPANISH pt==PORTUGUESE Constraints: N/A
refs	No	String	Definition: IDs of the referenced knowledge bases, which are separated by commas (,). Multiple knowledge bases can be used together to support Q&A. Constraints: N/A Value range: 0 to 1024 characters. Default value: N/A
tags	No	Array of TagInfo objects	Definition: Tag list. Constraints: N/A Value range: N/A Default value: N/A
extend_config	No	KnowledgeRepoExtendConfig object	Definition: Knowledge base extension configuration. Constraints: N/A Value range: N/A Default value: N/A
prompt_info	No	KnowledgeRepoPromptInfo object	Definition: Prompt information. Constraints: N/A Value range: N/A Default value: N/A
table_rag_enabled	No	Boolean	Definition: Whether to enable tableRAG. Constraints: N/A Value range: true: Enable tableRAG. false: Disable tableRAG. Default value: false

**Table 4** FileExtractConf
Parameter	Mandatory	Type	Description
parse_conf	No	ParseConfReq object	Definition: Document parsing configuration, including whether to use OCR enhancement, whether to parse images, whether to extract text during image parsing, whether to parse the header and footer, and whether to parse the contents page. Value range: N/A
split_conf	No	SplitConf object	Definition: Split configuration, including the segmentation mode, level parsing mode, title level depth, title saving mode, segment length, and title matching pattern. Value range: N/A
id	No	String	Definition: Document parsing ID. Constraints: N/A Value range: 0 to 128 characters. Default value: N/A

**Table 5** ParseConfReq
Parameter	Mandatory	Type	Description
ocr_enabled	No	Boolean	Definition: OCR enhancement. Constraints: N/A Value range: true: Enable OCR enhancement. false: Do not enable OCR enhancement. Default value: false
mllm_enabled	No	Boolean	Definition: Multimodal enhancement. Constraints: N/A Value range: true: Enable multi-modal enhancement. false: Disable multi-modal enhancement. Default value: false
mllm_model	No	String	Definition: Multimodal model name. Constraints: The mllm_plan model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit. Default value: N/A
image_enabled	No	Boolean	Definition: Image parsing. Constraints: N/A Value range: true: Parse images. false: Do not parse images. Default value: false
header_footer_enabled	No	Boolean	Definition: Parse the header and footer. Constraints: N/A Value range: true: Parse the header and footer. false: Ignore the header and footer. Default value: false
catalog_enabled	No	Boolean	Definition: Parse the contents page. Constraints: N/A Value range: true: Parse the contents page. false: Ignore the contents page. Default value: false
image_conf	No	String	Definition: Image parsing mode used when image parsing is enabled (image_enable is set to True). Constraints: To return answers with images, you must use the IMAGE mode and retain original images. Value range: TEXT: Extracts text from the image. IMAGE: Retains the original image. IMAGE_TEXT: Extracts the text and retains the original image. Default value: TEXT
footnote_enabled	No	Boolean	Definition: Parse footnotes. Constraints: N/A Value range: true: Parse footnotes. false: Do not parse footnotes. Default value: false

**Table 6** SplitConf
Parameter	Mandatory	Type	Description
split_mode	No	String	Definition: Document segmentation mode. Value range: The value can be: AUTO: The system automatically identifies the document format and matches the appropriate segmentation and parsing mode. LENGTH: Segments a document by length. For example, a document is segmented into paragraphs every 500 characters. CATALOG: Automatic parsing under hierarchical segmentation. The system automatically identifies the hierarchical structure of an article and segments the article based on the hierarchical structure. For example, section 1.1.2 is a segment, and section 1.1.3 is a segment. RULE: Rule-based parsing under hierarchical segmentation. You can customize the matching rules of hierarchical titles and match and split chapters based on custom rules. Constraints: N/A Default value: AUTO
separator_ids	No	Array of strings	Definition: The chunk ID list in automatic segmentation and length segmentation modes. Chunk ID: determines the end character for each chunk. Constraints: N/A Value range: Value mapping: period_zh: "Chinese period ", period_en: "English period .", exclamation_mark_zh: "Chinese exclamation mark ", exclamation_mark_en: "English exclamation mark !", question_mark_zh: "Chinese question mark ", question_mark_en: "English question mark ?", question_mark_ar: "Arabic question mark ؟", comma_zh: "Chinese comma", comma_en: "English comma ,", space_en: "Space" Default value: {"period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en"}
rule_regex_id	No	String	Definition: User-defined Parsing Rule ID Constraints: N/A Value range: N/A Default value: N/A
chunk_size	No	Integer	Definition: Maximum length of a document chunk. The document is segmented based on the maximum chunk length. Constraints: N/A Value range: 0-6000 Default value: 500
title_level	No	Integer	Definition: Title hierarchy depth retained in a chunk. For example: If the depth is 3 and the current paragraph is 1.1.3, then the parent titles 1.1 and 1 are both retained. If the depth is 2 and the current paragraph is 1.1.3, then the parent title 1.1 is retained, and the parent title 1 is discarded. Constraints: N/A Value range: 1-10 Default value: 3
combine_title	No	Boolean	Definition: Whether to retain the hierarchical title combination. Constraints: N/A Value range: true: Retain the hierarchical title combination. false: Do not retain the hierarchical title combination. Default value: false
merge_titles	No	Boolean	Definition: Cross-Title Merge: When text in paragraphs with different titles is limited, it is automatically merged up to a specified section length, aiding in the creation of a more comprehensive outcome. Constraints: N/A Value range: true: Merge across titles. false: Do not merge across titles. Default value: false
rule_regexs	No	Array of strings	Definition: User-defined parsing rules. Constraints: N/A Value range: The list length ranges from 1 to 100. Default value: N/A
merge_last_chunk	No	Boolean	Definition: Whether to merge the most recently modified segments. Constraints: N/A Value range: true: Merge the most recently modified segments. false: Do not merge the most recently modified segments. Default value: N/A

**Table 7** SessionConfig
Parameter	Mandatory	Type	Description
similarity_threshold	Yes	Float	Definition: Query2query similarity threshold for cache hit. A higher threshold indicates a higher similarity between the new query and the cached query is required. Constraints: N/A Options: 0.1 ~ 1.0 Default value: 0.9
answer_select_policy	Yes	String	Definition: Cache hit selection policy. Constraints: N/A Value range: The value can be: FIRST: Select the answer with the highest score from the matched results. RANDOM: Randomly select an answer from the matched results. Default value: N/A
eviction	Yes	Eviction object	Definition: Cache expiration policy. Constraints: N/A Value range: N/A Default value: N/A
model_name	Yes	String	Definition: Name of the query2query model used to calculate the similarity between the new query and the cached query when there is a hit. Constraints: N/A Value range: 1 to 64 characters. Default value: N/A

**Table 8** Eviction
Parameter	Mandatory	Type	Description
policy	Yes	String	Definition: Cache expiration policy. Constraints: N/A Value range: The value can be: LRU: (Least Recently Used). If Now − accessTime > TTL, clear the item. FIFO: (First In First Out). If now − createTime > TTL, clear the item. LFU: (Least Frequency Used). If hit_count < threshold, clear the item. Default value: N/A
ttl	No	Long	Definition: Cache expiration time, in milliseconds. Constraints: N/A Value range: 0-31536000000 Default value: N/A
hit_count_threshold	No	Long	Definition: Threshold of cache hits. Constraints: N/A Value range: 1-10000 Default value: N/A

**Table 9** TagInfo
Parameter	Mandatory	Type	Description
tag_key	Yes	String	Definition: Knowledge base tag keyword. Constraints: N/A Value range: 1 to 128 characters. Default value: N/A
tag_value	Yes	String	Definition: Knowledge base tag information. Constraints: N/A Value range: 1 to 128 characters. Default value: N/A

**Table 10** KnowledgeRepoExtendConfig
Parameter	Mandatory	Type	Description
extend_context	No	Boolean	Definition: Extend the context length to generate more comprehensive responses, for example: Table small-to-big Association of text (1)(2)(3) Document summary Constraints: N/A Value range: true: Extend the context length. false: Do not extend the context. Default value: false
effective_input_length	No	Integer	Definition: Optimal context length, which varies with different models. Set the valid length of input tokens to ensure optimal output. In consideration of multi-turn dialogues, we recommend setting this length to 60/ %(rounded up) of the maximum context length supported by the model. Constraints: N/A Value range: 2-256 Default value: 32
top_p	No	Float	Definition: An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. Constraints: N/A Value range: 0.1-1 Default value: 0.1
max_tokens	No	Integer	Definition: Maximum number of tokens in the generated text. The total length of the input text plus the generated text cannot exceed the maximum length that the model can process. Constraints: N/A Value range: 1-262144 Default value: 2048
chat_temperature	No	Float	Definition: Diversity of non-RAG model's output. Constraints: N/A Value range: N/A Default value: 0-1
search_temperature	No	Float	Definition: Diversity of the RAG model's output. Constraints: N/A Value range: 0-1 Default value: 0.6
presence_penalty	No	Float	Definition: Text repetition penalty. Constraints: N/A Value range: -2 - 2 Default value: 0
use_system_prompt	No	Boolean	Definition: Whether to use system prompts. Keep consistent with the standard prompt assembly solution used by Pangu RAG. Constraints: N/A Value range: true: Use system prompts. false: Do not use system prompts. Default value: false
system_prompt	No	String	Definition: System prompt. Note: This parameter is mandatory when use_system_prompt is set to true. No need to combine the query. Constraints: N/A Value range: 0-8192 Default value: N/A
qa_question_prompt	No	String	Definition: QA generation and question generation prompt. Constraints: N/A Value range: 0-8192 Default value: N/A
qa_answer_prompt	No	String	Definition: QA generation and answer generation prompt. Constraints: N/A Value range: N/A Default value: 0-8192
refuse_enable	No	Boolean	Definition: Whether to reject certain questions. Constraints: N/A Value range: true: Enable answer rejection. false: Disable answer rejection. Default value: false
refuse_answer	No	String	Definition: Rejection answer. Constraints: N/A Value range: 1-8192 Default value: N/A
image_match_type	No	String	Definition: Image description parameter. The options are context_match, reference_match, and model_match. The default value is context_match. Constraints: N/A Value range: context_match reference_match model_match Default value: context_match
custom_types	No	Map<String,Map<String,String>>	Definition: Custom structure type. Constraints: N/A Value range: N/A Default value: N/A
directory_enable	No	Boolean	Definition: Whether to enable directory management. Constraints: N/A Value range: true: Enable directory management. false: Disable directory management. Default value: false
embedding_search_enable	No	Boolean	Definition: Whether to enable vector search. Constraints: N/A Value range: true: Enable vector search. false: Disable vector search. Default value: true
keyword_search_enable	No	Boolean	Definition: Whether to enable keyword-based search. Constraints: N/A Value range: true: Enable keyword-based search. false: Disable keyword-based search. Default value: false
keyword_top_k	No	Integer	Definition: Top-k for keyword-based search. The value ranges from 0 to 100. The default value is 10. Constraints: N/A Value range: 0-100 Default value: 10
search_engine_type	No	String	Definition: Search engine type. Constraints: N/A Value range: The value can be: search_engine: web search engine. ai_engine: enhanced web search service. Default value: N/A
search_engine_name	No	String	Definition: Search engine name. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A
think_model_name	No	String	Definition: Name of the deep thinking model. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A
faq_top_k	No	Integer	Definition: Top-k for Q&A and hybrid search where results are not directly from preset FAQs. Constraints: N/A Value range: 0-50 Default value: 2
faq_similarity_threshold	No	Float	Definition: Threshold for Q&A and hybrid search where results are not directly from preset FAQs. Constraints: N/A Value range: 0-1 Default value: 0.8
extract_model_name	No	String	Definition: Graph extraction model name. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A
optimize_model_name	No	String	Definition: Name of the graph optimization model Constraints: N/A Value range: The value cannot exceed 64 characters. Default value: N/A
graph_search_enable	No	Boolean	Definition: Whether to enable graph search. Constraints: N/A Value range: true: Enable graph search. false: Disable graph search. Default value: false
graph_reference_count	No	Integer	Definition: Number of graph search reference documents. This parameter takes effect when graph search is enabled. Constraints: N/A Value range: 1-50 Default value: 10
graph_top_k	No	Integer	Definition: Top-k for graph vector recall. Constraints: N/A Value range: 1-500 Default value: 50
graph_keyword_top_k	No	Integer	Definition: Top-k for keyword-based graph search. Constraints: N/A Value range: 1-100 Default value: 20
graph_threshold	No	Float	Definition: Graph re-ranking threshold. Constraints: N/A Value range: 0-200 Default value: 0.3
number_of_shards	No	Integer	Definition: Number of knowledge base index shards. Constraints: N/A Value range: 1-1024 Default value: 3
number_of_replicas	No	Integer	Definition: Number of knowledge base index replicas. Constraints: N/A Value range: 0-3 Default value: 1

**Table 11** KnowledgeRepoPromptInfo
Parameter	Mandatory	Type	Description
prompt_id	No	String	Definition: Prompt ID. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A
qa_question_prompt_id	No	String	Definition: QA question generation prompt ID. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A
qa_answer_prompt_id	No	String	Definition: QA answer generation prompt ID Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A
mllm_prompt_id	No	String	Definition: ID of the mllm prompt. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A
table_rag_config	No	String	Definition: Prompts related to tabular enhancement, including: chat_prompt_with_sqlresults_id: Q&A prompts for tabular enhancement nl2sql_prompt_id: prompt for generating SQL statements table_rag_prompt_id: tabular Q&A prompt Constraints: N/A Value range: 1 to 512 characters. Default value: N/A

Response Parameters

Status code: 200

**Table 12** Response body parameters
Parameter	Type	Description
repo_id	String	Definition: Knowledge base ID. Value range: N/A

Status code: 400

**Table 13** Response body parameters
Parameter	Type	Description
error_code	String	Definition: Error Code. Value range: N/A
error_msg	String	Definition: Error message. Value range: N/A

Status code: 500

**Table 14** Response body parameters
Parameter	Type	Description
error_code	String	Definition: Error Code. Value range: N/A
error_msg	String	Definition: Error message. Value range: N/A

Example Requests

Create a knowledge base.

/v1/1ed40ceefc8d40f8b884edb6a84e7768/applications/fb9731ab-7085-474f-b6c7-64473586f0f3/uni-search/knowledge-repo

{
  "name" : "test_20250425",
  "language_id" : "zh",
  "detail" : "",
  "tags" : [ ],
  "file_extract" : {
    "parse_conf" : {
      "ocr_enabled" : true,
      "image_enabled" : true,
      "header_footer_enabled" : false,
      "catalog_enabled" : false,
      "image_conf" : "TEXT"
    },
    "split_conf" : {
      "split_mode" : "CATALOG",
      "separator_ids" : [ "period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en" ],
      "chunk_size" : 500,
      "merge_titles" : true,
      "title_level" : 3,
      "combine_title" : true
    }
  },
  "extend_config" : {
    "extend_context" : false,
    "effective_input_length" : 3,
    "custom_types" : { },
    "image_match_type" : "context_match",
    "directory_enable" : false,
    "search_engine_name" : "bocha",
    "think_model_name" : "dp-r1"
  },
  "embedding_model" : "embedding-zh",
  "rerank_model" : "rerank-zh",
  "search_plan_model" : "search_ai_plan",
  "pangu_nlp_model" : "dp-r1",
  "cache_enabled" : true,
  "answer_reference_enabled" : false,
  "answer_image_reference_enabled" : false,
  "session_config" : {
    "model_name" : "embedding-zh_faq",
    "similarity_threshold" : 0.9,
    "answer_select_policy" : "first",
    "eviction" : {
      "policy" : "lru",
      "hit_count_threshold" : 1,
      "ttl" : 86400000
    }
  }
}