Creating a Knowledge Base
Function
Create a knowledge base.
URI
POST /v1/{project_id}/applications/{application_id}/uni-search/knowledge-repo
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Definition: Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. Constraints: N/A Value range: The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter. Default value: N/A |
application_id |
Yes |
String |
Definition: Application ID. For details about how to obtain the application ID, see Obtaining an Application ID. Constraints: Character string Value range: The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter. Default value: N/A |
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
X-Auth-Token |
Yes |
String |
Definition: Token used for API authentication. For details about how to obtain the token, see Obtaining an IAM User Token. Constraints: N/A Value range: N/A Default value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Definition: Knowledge base name. Constraints: N/A Value range: The value can contain 1 to 64 characters. It can only contain letters, digits, hyphens (-), and underscores (_), and must start with a letter or digit. Default value: N/A |
detail |
No |
String |
Definition: Description of the knowledge base. Constraints: N/A Value range: The length cannot exceed 100 characters. Default value: N/A |
search_plan_category_ids |
No |
Array of strings |
Definition: Search planning categories. If there is no hit on category, the knowledge base is searched and then an LLM summarizes the results. If a category is matched, the LLM answers the question directly. Constraints: N/A Value range: The value can be:
Default value: N/A |
file_extract |
No |
FileExtractConf object |
Definition: Overall configuration of document parsing. Constraints: N/A Value range: N/A Default value: N/A |
cache_enabled |
No |
Boolean |
Definition: Whether to enable cache for the current knowledge base. Constraints: N/A Value range:
Default value: false |
answer_reference_enabled |
No |
Boolean |
Definition: Whether to enable reference source tracing in the current knowledge base. When enabled, the sources of the generated answers will be located. Constraints: N/A Value range: N/A Default value: false |
answer_image_reference_enabled |
No |
Boolean |
Definition: Whether to include both text and images. Constraints: Answer reference must be enabled (answer_reference_enabled). Value range:
Default value: false |
session_config |
No |
SessionConfig object |
Definition: Cache policy. Constraints: N/A Value range: N/A Default value: N/A |
embedding_model |
No |
String |
Definition: Name of the embedding model used by the current knowledge base. Embedding model: used to vectorize text content for vector search. Constraints: The embedding model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-). Default value: N/A |
rerank_model |
No |
String |
Definition: Name of the reranking model used by the current knowledge base. Reranking model: Reranks the top_k results returned by the embedding model through vector search. The purpose is to provide results that are most relevant to the query. Constraints: The reranking model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-). Default value: N/A |
pangu_nlp_model |
No |
String |
Definition: Name of the NLP foundation model used by the current knowledge base. NLP model: a generative model used by the chat API to generate answers based on the input text. Constraints: The reranking model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-). Default value: N/A |
search_plan_model |
No |
String |
Definition: Name of the search planning model. search_plan model: search planning model. It replans user queries, including query rewriting and query completion. Constraints: The search planning model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit. Default value: N/A |
language_id |
No |
String |
Definition: Language ID of the current knowledge base. Constraints: N/A Value range:
Constraints: N/A |
refs |
No |
String |
Definition: IDs of the referenced knowledge bases, which are separated by commas (,). Multiple knowledge bases can be used together to support Q&A. Constraints: N/A Value range: 0 to 1024 characters. Default value: N/A |
tags |
No |
Array of TagInfo objects |
Definition: Tag list. Constraints: N/A Value range: N/A Default value: N/A |
extend_config |
No |
KnowledgeRepoExtendConfig object |
Definition: Knowledge base extension configuration. Constraints: N/A Value range: N/A Default value: N/A |
prompt_info |
No |
KnowledgeRepoPromptInfo object |
Definition: Prompt information. Constraints: N/A Value range: N/A Default value: N/A |
table_rag_enabled |
No |
Boolean |
Definition: Whether to enable tableRAG. Constraints: N/A Value range: N/A Default value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
parse_conf |
No |
ParseConf object |
Definition: Document parsing configuration, including whether to use OCR enhancement, whether to parse images, whether to extract text during image parsing, whether to parse the header and footer, and whether to parse the contents page. Value range: N/A |
split_conf |
No |
SplitConf object |
Definition: Split configuration, including the segmentation mode, level parsing mode, title level depth, title saving mode, segment length, and title matching pattern. Value range: N/A |
id |
No |
String |
Definition: Document parsing ID. Constraints: N/A Value range: 0 to 128 characters. Default value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
ocr_enabled |
No |
Boolean |
Definition: OCR enhancement. Constraints: N/A Value range: N/A Default value: false |
mllm_enabled |
No |
Boolean |
Definition: Multimodal enhancement. Constraints: N/A Value range: N/A Default value: false |
mllm_model |
No |
String |
Definition: Multimodal model name. Constraints: The mllm_plan model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API. Value range: The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit. Default value: N/A |
mllm_prompt |
No |
Map<String,String> |
Definition: Prompt of the multimodal model. Constraints: A default prompt is provided. You can also configure custom prompts. Value range: N/A Default value: N/A |
image_enabled |
No |
Boolean |
Definition: Image parsing. Constraints: N/A Value range: N/A Default value: false |
header_footer_enabled |
No |
Boolean |
Definition: Parse the header and footer. Constraints: N/A Value range: N/A Default value: false |
catalog_enabled |
No |
Boolean |
Definition: Parse contents page. Constraints: N/A Value range: N/A Default value: false |
image_conf |
No |
String |
Definition: Image parsing mode when image_enable is set to True. Constraints: When answers need to be returned with images, the IMAGE mode must be used to retain the original images. Value range:
Default value: TEXT |
footnote_enabled |
No |
Boolean |
Definition: Parse footnotes. Constraints: N/A Value range: N/A Default value: false |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
split_mode |
No |
String |
Definition: Document segmentation mode. Value range: The value can be:
Constraints: N/A Default value: AUTO |
separator_ids |
No |
Array of strings |
Definition: The chunk ID list in automatic segmentation and length segmentation modes. Chunk ID: determines the end character for each chunk. Constraints: N/A Value range: Value mapping:
Default value: {"period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en"} |
rule_regex_id |
No |
String |
Definition: User-defined Parsing Rule ID Constraints: N/A Value range: N/A Default value: N/A |
chunk_size |
No |
Integer |
Definition: Maximum length of a document chunk. The document is segmented based on the maximum chunk length. Constraints: N/A Value range: 0-6000 Default value: 500 |
title_level |
No |
Integer |
Definition: Title hierarchy depth retained in a chunk. For example: If the depth is 3 and the current paragraph is 1.1.3, then the parent titles 1.1 and 1 are both retained. If the depth is 2 and the current paragraph is 1.1.3, then the parent title 1.1 is retained, and the parent title 1 is discarded. Constraints: N/A Value range: 1-10 Default value: 3 |
combine_title |
No |
Boolean |
Definition: Whether to retain the hierarchical title combination. Constraints: N/A Value range: N/A Default value: false |
merge_titles |
No |
Boolean |
Definition: Cross-Title Merge: When text in paragraphs with different titles is limited, it is automatically merged up to a specified section length, aiding in the creation of a more comprehensive outcome. Constraints: N/A Value range: N/A Default value: false |
rule_regexs |
No |
Array of strings |
Definition: User-defined parsing rules. Constraints: N/A Value range: The list length ranges from 1 to 100. Default value: N/A |
merge_last_chunk |
No |
Boolean |
Definition: Whether to merge the most recent modified segments. Constraints: N/A Value range: N/A Default value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
similarity_threshold |
Yes |
Float |
Definition: Query2query similarity threshold for cache hit. A higher threshold indicates a higher similarity between the new query and the cached query is required. Constraints: N/A Options: 0.1 ~ 1.0 Default value: 0.9 |
answer_select_policy |
Yes |
String |
Definition: Cache hit selection policy. Constraints: N/A Value range: The value can be:
Default value: N/A |
eviction |
Yes |
Eviction object |
Definition: Cache expiration policy. Constraints: N/A Value range: N/A Default value: N/A |
model_name |
Yes |
String |
Definition: Name of the query2query model used to calculate the similarity between the new query and the cached query when there is a hit. Constraints: N/A Value range: 1 to 64 characters. Default value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
policy |
Yes |
String |
Definition: Cache expiration policy. Constraints: N/A Value range: The value can be:
Default value: N/A |
ttl |
No |
Long |
Definition: Cache expiration time, in milliseconds. Constraints: N/A Value range: 0-31536000000 Default value: N/A |
hit_count_threshold |
No |
Long |
Definition: Threshold of cache hits. Constraints: N/A Value range: 1-10000 Default value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
tag_key |
Yes |
String |
Definition: Knowledge base tag keyword. Constraints: N/A Value range: 1 to 128 characters. Default value: N/A |
tag_value |
Yes |
String |
Definition: Knowledge base tag information. Constraints: N/A Value range: 1 to 128 characters. Default value: N/A |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
extend_context |
No |
Boolean |
Definition: Extend the context length to generate more comprehensive responses, for example:
Constraints: N/A Value range: N/A Default value: false |
effective_input_length |
No |
Integer |
Definition: Optimal context length, which varies with different models. Set the valid length of input tokens to ensure optimal output. In consideration of multi-turn dialogues, we recommend setting this length to 60/ %(rounded up) of the maximum context length supported by the model. Constraints: N/A Value range: 2-256 Default value: 32 |
top_p |
No |
Float |
Definition: An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. Constraints: N/A Value range: 0.1-1 Default value: 0.1 |
max_tokens |
No |
Integer |
Definition: Maximum number of tokens in the generated text. The total length of the input text plus the generated text cannot exceed the maximum length that the model can process. Constraints: N/A Value range: 1-262144 Default value: 2048 |
chat_temperature |
No |
Float |
Definition: Diversity of non-RAG model's output. Constraints: N/A Value range: N/A Default value: 0-1 |
search_temperature |
No |
Float |
Definition: Diversity of the RAG model's output. Constraints: N/A Value range: 0-1 Default value: 0.6 |
presence_penalty |
No |
Float |
Definition: Text repetition penalty. Constraints: N/A Value range: -2 - 2 Default value: 0 |
use_system_prompt |
No |
Boolean |
Definition: Whether to use system prompts. Keep consistent with the standard prompt assembly solution used by Pangu RAG. Constraints: N/A Value range: N/A Default value: false |
system_prompt |
No |
String |
Definition: System prompt. Note:
Constraints: N/A Value range: 0-8192 Default value: N/A |
qa_question_prompt |
No |
String |
Definition: QA generation and question generation prompt. Constraints: N/A Value range: 0-8192 Default value: N/A |
qa_answer_prompt |
No |
String |
Definition: QA generation and answer generation prompt. Constraints: N/A Value range: N/A Default value: 0-8192 |
refuse_enable |
No |
Boolean |
Definition: Whether to reject certain questions. Constraints: N/A Value range: N/A Default value: false |
refuse_answer |
No |
String |
Definition: Rejection answer. Constraints: N/A Value range: 1-8192 Default value: N/A |
image_match_type |
No |
String |
Definition: Image description parameter. The options are context_match, reference_match, and model_match. The default value is context_match. Constraints: N/A Value range:
Default value: context_match |
custom_types |
No |
Map<String,Map<String,String>> |
Definition: Custom structure type. Constraints: N/A Value range: N/A Default value: N/A |
directory_enable |
No |
Boolean |
Definition: Whether to enable directory management. Constraints: N/A Value range: N/A Default value: false |
embedding_search_enable |
No |
Boolean |
Definition: Whether to enable vector search. Constraints: N/A Value range: N/A Default value: true |
keyword_search_enable |
No |
Boolean |
Definition: Whether to enable keyword-based search. Constraints: N/A Value range: N/A Default value: N/A |
keyword_top_k |
No |
Integer |
Definition: Top-k for keyword-based search. The value ranges from 0 to 100. The default value is 10. Constraints: N/A Value range: 0-100 Default value: 10 |
search_engine_type |
No |
String |
Definition: Search engine type. Constraints: N/A Value range: The value can be:
Default value: N/A |
search_engine_name |
No |
String |
Definition: Search engine name. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A |
think_model_name |
No |
String |
Definition: Name of the deep thinking model. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A |
faq_top_k |
No |
Integer |
Definition: Top-k for Q&A and hybrid search where results are not directly from preset FAQs. Constraints: N/A Value range: 0-50 Default value: 2 |
faq_similarity_threshold |
No |
Float |
Definition: Threshold for Q&A and hybrid search where results are not directly from preset FAQs. Constraints: N/A Value range: 0-1 Default value: 0.8 |
extract_model_name |
No |
String |
Definition: Graph extraction model name. Constraints: N/A Value range: 0 to 64 characters. Default value: N/A |
optimize_model_name |
No |
String |
Definition: Name of the graph optimization model Constraints: N/A Value range: The value cannot exceed 64 characters. Default value: N/A |
graph_search_enable |
No |
Boolean |
Definition: Whether to enable graph search. Constraints: N/A Value range: N/A Default value: false |
graph_reference_count |
No |
Integer |
Definition: Number of graph search reference documents. This parameter takes effect when graph search is enabled. Constraints: N/A Value range: 1-50 Default value: 10 |
graph_top_k |
No |
Integer |
Definition: Top-k for graph vector recall. Constraints: N/A Value range: 1-500 Default value: 50 |
graph_keyword_top_k |
No |
Integer |
Definition: Top-k for keyword-based graph search. Constraints: N/A Value range: 1-100 Default value: 20 |
graph_threshold |
No |
Float |
Definition: Graph re-ranking threshold. Constraints: N/A Value range: 0-200 Default value: 0.3 |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
prompt_id |
No |
String |
Definition: Prompt ID. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A |
qa_question_prompt_id |
No |
String |
Definition: QA question generation prompt ID. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A |
qa_answer_prompt_id |
No |
String |
Definition: QA answer generation prompt ID Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A |
mllm_prompt_id |
No |
String |
Definition: ID of the mllm prompt. Constraints: N/A Value range: The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. Default value: N/A |
table_rag_config |
No |
String |
Definition: Prompts related to tabular enhancement, including: chat_prompt_with_sqlresults_id: Q&A prompts for tabular enhancement nl2sql_prompt_id: prompt for generating SQL statements table_rag_prompt_id: tabular Q&A prompt Constraints: N/A Value range: 1 to 512 characters. Default value: N/A |
Response Parameters
Status code: 200
Parameter |
Type |
Description |
---|---|---|
repo_id |
String |
Definition: Knowledge base ID. Value range: N/A |
Status code: 400
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
Definition: Value range: N/A |
error_msg |
String |
Definition: Error message. Value range: N/A |
Status code: 500
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
Definition: Value range: N/A |
error_msg |
String |
Definition: Error message. Value range: N/A |
Example Requests
Create a knowledge base.
/v1/1ed40ceefc8d40f8b884edb6a84e7768/applications/fb9731ab-7085-474f-b6c7-64473586f0f3/uni-search/knowledge-repo { "name" : "test_20250425", "language_id" : "zh", "detail" : "", "tags" : [ ], "file_extract" : { "parse_conf" : { "ocr_enabled" : true, "image_enabled" : true, "header_footer_enabled" : false, "catalog_enabled" : false, "image_conf" : "TEXT" }, "split_conf" : { "split_mode" : "CATALOG", "separator_ids" : [ "period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en" ], "chunk_size" : 500, "merge_titles" : true, "title_level" : 3, "combine_title" : true } }, "extend_config" : { "extend_context" : false, "effective_input_length" : 3, "custom_types" : { }, "image_match_type" : "context_match", "directory_enable" : false, "search_engine_name" : "bocha", "think_model_name" : "dp-r1" }, "embedding_model" : "embedding-zh", "rerank_model" : "rerank-zh", "search_plan_model" : "search_ai_plan", "pangu_nlp_model" : "dp-r1", "cache_enabled" : true, "answer_reference_enabled" : false, "answer_image_reference_enabled" : false, "session_config" : { "model_name" : "embedding-zh_faq", "similarity_threshold" : 0.9, "answer_select_policy" : "first", "eviction" : { "policy" : "lru", "hit_count_threshold" : 1, "ttl" : 86400000 } } }
Example Responses
Status code: 200
Response body for creating a knowledge base.
{ "repo_id" : "1235abc" }
Status Codes
Status Code |
Description |
---|---|
200 |
Response body for creating a knowledge base. |
400 |
Incorrect request body parameter. |
500 |
Internal error. |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot