Updated on 2025-08-13 GMT+08:00

Creating a Knowledge Base

Function

Create a knowledge base.

URI

POST /v1/{project_id}/applications/{application_id}/uni-search/knowledge-repo

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition:

Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Constraints:

N/A

Value range:

The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter.

Default value:

N/A

application_id

Yes

String

Definition:

Application ID. For details about how to obtain the application ID, see Obtaining an Application ID.

Constraints:

Character string

Value range:

The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter.

Default value:

N/A

Request Parameters

Table 2 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

Yes

String

Definition:

Token used for API authentication. For details about how to obtain the token, see Obtaining an IAM User Token.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

Table 3 Request body parameters

Parameter

Mandatory

Type

Description

name

Yes

String

Definition:

Knowledge base name.

Constraints:

N/A

Value range:

The value can contain 1 to 64 characters. It can only contain letters, digits, hyphens (-), and underscores (_), and must start with a letter or digit.

Default value:

N/A

detail

No

String

Definition:

Description of the knowledge base.

Constraints:

N/A

Value range:

The length cannot exceed 100 characters.

Default value:

N/A

search_plan_category_ids

No

Array of strings

Definition:

Search planning categories.

If there is no hit on category, the knowledge base is searched and then an LLM summarizes the results.

If a category is matched, the LLM answers the question directly.

Constraints:

N/A

Value range:

The value can be:

  • professional_knowledge-medical

  • weather

  • professional_knowledge

  • professional_knowledge-manufacturing

  • chitchat

  • language_tasks

  • professional_knowledge-finance

  • general_knowledge

  • professional_knowledge-government

  • identity: persona

Default value:

N/A

file_extract

No

FileExtractConf object

Definition:

Overall configuration of document parsing.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

cache_enabled

No

Boolean

Definition:

Whether to enable cache for the current knowledge base.

Constraints:

N/A

Value range:

  • false: The result is not cached.

  • true: Queries, references, and answers are cached. If a similar query is encountered subsequently, the cached result is directly returned, reducing the query latency.

Default value:

false

answer_reference_enabled

No

Boolean

Definition:

Whether to enable reference source tracing in the current knowledge base. When enabled, the sources of the generated answers will be located.

Constraints:

N/A

Value range:

N/A

Default value:

false

answer_image_reference_enabled

No

Boolean

Definition:

Whether to include both text and images.

Constraints:

Answer reference must be enabled (answer_reference_enabled).

Value range:

  • true: If a document contains images related to a question, the answer must include both text and images.

  • false: Not to recall images.

Default value:

false

session_config

No

SessionConfig object

Definition:

Cache policy.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

embedding_model

No

String

Definition:

Name of the embedding model used by the current knowledge base.

Embedding model: used to vectorize text content for vector search.

Constraints:

The embedding model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API.

Value range:

The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-).

Default value:

N/A

rerank_model

No

String

Definition:

Name of the reranking model used by the current knowledge base.

Reranking model: Reranks the top_k results returned by the embedding model through vector search. The purpose is to provide results that are most relevant to the query.

Constraints:

The reranking model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API.

Value range:

The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-).

Default value:

N/A

pangu_nlp_model

No

String

Definition:

Name of the NLP foundation model used by the current knowledge base.

NLP model: a generative model used by the chat API to generate answers based on the input text.

Constraints:

The reranking model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API.

Value range:

The parameter value consists of 1 to 32 characters. It can only contain letters, digits, underscores (_), and hyphens (-).

Default value:

N/A

search_plan_model

No

String

Definition:

Name of the search planning model.

search_plan model: search planning model. It replans user queries, including query rewriting and query completion.

Constraints:

The search planning model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API.

Value range:

The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit.

Default value:

N/A

language_id

No

String

Definition:

Language ID of the current knowledge base.

Constraints:

N/A

Value range:

  • zh==CHINESE

  • en==ENGLISH

  • th==THAI

  • ar==ARABIC

  • es==SPANISH

  • pt==PORTUGUESE

Constraints:

N/A

refs

No

String

Definition:

IDs of the referenced knowledge bases, which are separated by commas (,). Multiple knowledge bases can be used together to support Q&A.

Constraints:

N/A

Value range:

0 to 1024 characters.

Default value:

N/A

tags

No

Array of TagInfo objects

Definition:

Tag list.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

extend_config

No

KnowledgeRepoExtendConfig object

Definition:

Knowledge base extension configuration.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

prompt_info

No

KnowledgeRepoPromptInfo object

Definition:

Prompt information.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

table_rag_enabled

No

Boolean

Definition:

Whether to enable tableRAG.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

Table 4 FileExtractConf

Parameter

Mandatory

Type

Description

parse_conf

No

ParseConf object

Definition:

Document parsing configuration, including whether to use OCR enhancement, whether to parse images, whether to extract text during image parsing, whether to parse the header and footer, and whether to parse the contents page.

Value range:

N/A

split_conf

No

SplitConf object

Definition:

Split configuration, including the segmentation mode, level parsing mode, title level depth, title saving mode, segment length, and title matching pattern.

Value range:

N/A

id

No

String

Definition:

Document parsing ID.

Constraints:

N/A

Value range:

0 to 128 characters.

Default value:

N/A

Table 5 ParseConf

Parameter

Mandatory

Type

Description

ocr_enabled

No

Boolean

Definition:

OCR enhancement.

Constraints:

N/A

Value range:

N/A

Default value:

false

mllm_enabled

No

Boolean

Definition:

Multimodal enhancement.

Constraints:

N/A

Value range:

N/A

Default value:

false

mllm_model

No

String

Definition:

Multimodal model name.

Constraints:

The mllm_plan model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API.

Value range:

The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit.

Default value:

N/A

mllm_prompt

No

Map<String,String>

Definition:

Prompt of the multimodal model.

Constraints:

A default prompt is provided. You can also configure custom prompts.

Value range:

N/A

Default value:

N/A

image_enabled

No

Boolean

Definition:

Image parsing.

Constraints:

N/A

Value range:

N/A

Default value:

false

header_footer_enabled

No

Boolean

Definition:

Parse the header and footer.

Constraints:

N/A

Value range:

N/A

Default value:

false

catalog_enabled

No

Boolean

Definition:

Parse contents page.

Constraints:

N/A

Value range:

N/A

Default value:

false

image_conf

No

String

Definition:

Image parsing mode when image_enable is set to True.

Constraints:

When answers need to be returned with images, the IMAGE mode must be used to retain the original images.

Value range:

  • TEXT: extracts text from images.

  • IMAGE: retains the original images.

  • IMAGE_TEXT: parses text and retains the original images.

Default value:

TEXT

footnote_enabled

No

Boolean

Definition:

Parse footnotes.

Constraints:

N/A

Value range:

N/A

Default value:

false

Table 6 SplitConf

Parameter

Mandatory

Type

Description

split_mode

No

String

Definition:

Document segmentation mode.

Value range:

The value can be:

  • AUTO: The system automatically identifies the document format and matches the appropriate segmentation and parsing mode.

  • LENGTH: Segments a document by length. For example, a document is segmented into paragraphs every 500 characters.

  • CATALOG: Automatic parsing under hierarchical segmentation. The system automatically identifies the hierarchical structure of an article and segments the article based on the hierarchical structure. For example, section 1.1.2 is a segment, and section 1.1.3 is a segment.

  • RULE: Rule-based parsing under hierarchical segmentation. You can customize the matching rules of hierarchical titles and match and split chapters based on custom rules.

Constraints:

N/A

Default value:

AUTO

separator_ids

No

Array of strings

Definition:

The chunk ID list in automatic segmentation and length segmentation modes.

Chunk ID: determines the end character for each chunk.

Constraints:

N/A

Value range:

Value mapping:

  • period_zh: "Chinese period ",

  • period_en: "English period .",

  • exclamation_mark_zh: "Chinese exclamation mark ",

  • exclamation_mark_en: "English exclamation mark !",

  • question_mark_zh: "Chinese question mark ",

  • question_mark_en: "English question mark ?",

  • question_mark_ar: "Arabic question mark ؟",

  • comma_zh: "Chinese comma",

  • comma_en: "English comma ,",

  • space_en: "Space"

Default value:

{"period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en"}

rule_regex_id

No

String

Definition:

User-defined Parsing Rule ID

Constraints:

N/A

Value range:

N/A

Default value:

N/A

chunk_size

No

Integer

Definition:

Maximum length of a document chunk. The document is segmented based on the maximum chunk length.

Constraints:

N/A

Value range:

0-6000

Default value:

500

title_level

No

Integer

Definition:

Title hierarchy depth retained in a chunk.

For example:

If the depth is 3 and the current paragraph is 1.1.3, then the parent titles 1.1 and 1 are both retained.

If the depth is 2 and the current paragraph is 1.1.3, then the parent title 1.1 is retained, and the parent title 1 is discarded.

Constraints:

N/A

Value range:

1-10

Default value:

3

combine_title

No

Boolean

Definition:

Whether to retain the hierarchical title combination.

Constraints:

N/A

Value range:

N/A

Default value:

false

merge_titles

No

Boolean

Definition:

Cross-Title Merge: When text in paragraphs with different titles is limited, it is automatically merged up to a specified section length, aiding in the creation of a more comprehensive outcome.

Constraints:

N/A

Value range:

N/A

Default value:

false

rule_regexs

No

Array of strings

Definition:

User-defined parsing rules.

Constraints:

N/A

Value range:

The list length ranges from 1 to 100.

Default value:

N/A

merge_last_chunk

No

Boolean

Definition:

Whether to merge the most recent modified segments.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

Table 7 SessionConfig

Parameter

Mandatory

Type

Description

similarity_threshold

Yes

Float

Definition:

Query2query similarity threshold for cache hit. A higher threshold indicates a higher similarity between the new query and the cached query is required.

Constraints:

N/A

Options:

0.1 ~ 1.0

Default value:

0.9

answer_select_policy

Yes

String

Definition:

Cache hit selection policy.

Constraints:

N/A

Value range:

The value can be:

  • FIRST: Select the answer with the highest score from the matched results.

  • RANDOM: Randomly select an answer from the matched results.

Default value:

N/A

eviction

Yes

Eviction object

Definition:

Cache expiration policy.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

model_name

Yes

String

Definition:

Name of the query2query model used to calculate the similarity between the new query and the cached query when there is a hit.

Constraints:

N/A

Value range:

1 to 64 characters.

Default value:

N/A

Table 8 Eviction

Parameter

Mandatory

Type

Description

policy

Yes

String

Definition:

Cache expiration policy.

Constraints:

N/A

Value range:

The value can be:

  • LRU: (Least Recently Used). If Now − accessTime > TTL, clear the item.

  • FIFO: (First In First Out). If now − createTime > TTL, clear the item.

  • LFU: (Least Frequency Used). If hit_count < threshold, clear the item.

Default value:

N/A

ttl

No

Long

Definition:

Cache expiration time, in milliseconds.

Constraints:

N/A

Value range:

0-31536000000

Default value:

N/A

hit_count_threshold

No

Long

Definition:

Threshold of cache hits.

Constraints:

N/A

Value range:

1-10000

Default value:

N/A

Table 9 TagInfo

Parameter

Mandatory

Type

Description

tag_key

Yes

String

Definition:

Knowledge base tag keyword.

Constraints:

N/A

Value range:

1 to 128 characters.

Default value:

N/A

tag_value

Yes

String

Definition:

Knowledge base tag information.

Constraints:

N/A

Value range:

1 to 128 characters.

Default value:

N/A

Table 10 KnowledgeRepoExtendConfig

Parameter

Mandatory

Type

Description

extend_context

No

Boolean

Definition:

Extend the context length to generate more comprehensive responses, for example:

  1. small-to-big table transformation

  2. Association of text (1), (2), and (3)

  3. Document summarization.

Constraints:

N/A

Value range:

N/A

Default value:

false

effective_input_length

No

Integer

Definition:

Optimal context length, which varies with different models. Set the valid length of input tokens to ensure optimal output.

In consideration of multi-turn dialogues, we recommend setting this length to 60/ %(rounded up) of the maximum context length supported by the model.

Constraints:

N/A

Value range:

2-256

Default value:

32

top_p

No

Float

Definition:

An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter.

Constraints:

N/A

Value range:

0.1-1

Default value:

0.1

max_tokens

No

Integer

Definition:

Maximum number of tokens in the generated text.

The total length of the input text plus the generated text cannot exceed the maximum length that the model can process.

Constraints:

N/A

Value range:

1-262144

Default value:

2048

chat_temperature

No

Float

Definition:

Diversity of non-RAG model's output.

Constraints:

N/A

Value range:

N/A

Default value:

0-1

search_temperature

No

Float

Definition:

Diversity of the RAG model's output.

Constraints:

N/A

Value range:

0-1

Default value:

0.6

presence_penalty

No

Float

Definition:

Text repetition penalty.

Constraints:

N/A

Value range:

-2 - 2

Default value:

0

use_system_prompt

No

Boolean

Definition:

Whether to use system prompts. Keep consistent with the standard prompt assembly solution used by Pangu RAG.

Constraints:

N/A

Value range:

N/A

Default value:

false

system_prompt

No

String

Definition:

System prompt. Note:

  1. This parameter is mandatory when use_system_prompt is set to true.

  2. No need to combine the query.

Constraints:

N/A

Value range:

0-8192

Default value:

N/A

qa_question_prompt

No

String

Definition:

QA generation and question generation prompt.

Constraints:

N/A

Value range:

0-8192

Default value:

N/A

qa_answer_prompt

No

String

Definition:

QA generation and answer generation prompt.

Constraints:

N/A

Value range:

N/A

Default value:

0-8192

refuse_enable

No

Boolean

Definition:

Whether to reject certain questions.

Constraints:

N/A

Value range:

N/A

Default value:

false

refuse_answer

No

String

Definition:

Rejection answer.

Constraints:

N/A

Value range:

1-8192

Default value:

N/A

image_match_type

No

String

Definition:

Image description parameter. The options are context_match, reference_match, and model_match. The default value is context_match.

Constraints:

N/A

Value range:

  • context_match

  • reference_match

  • model_match

Default value:

context_match

custom_types

No

Map<String,Map<String,String>>

Definition:

Custom structure type.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

directory_enable

No

Boolean

Definition:

Whether to enable directory management.

Constraints:

N/A

Value range:

N/A

Default value:

false

embedding_search_enable

No

Boolean

Definition:

Whether to enable vector search.

Constraints:

N/A

Value range:

N/A

Default value:

true

keyword_search_enable

No

Boolean

Definition:

Whether to enable keyword-based search.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

keyword_top_k

No

Integer

Definition:

Top-k for keyword-based search. The value ranges from 0 to 100. The default value is 10.

Constraints:

N/A

Value range:

0-100

Default value:

10

search_engine_type

No

String

Definition:

Search engine type.

Constraints:

N/A

Value range:

The value can be:

  • search_engine: web search engine.

  • ai_engine: enhanced web search service.

Default value:

N/A

search_engine_name

No

String

Definition:

Search engine name.

Constraints:

N/A

Value range:

0 to 64 characters.

Default value:

N/A

think_model_name

No

String

Definition:

Name of the deep thinking model.

Constraints:

N/A

Value range:

0 to 64 characters.

Default value:

N/A

faq_top_k

No

Integer

Definition:

Top-k for Q&A and hybrid search where results are not directly from preset FAQs.

Constraints:

N/A

Value range:

0-50

Default value:

2

faq_similarity_threshold

No

Float

Definition:

Threshold for Q&A and hybrid search where results are not directly from preset FAQs.

Constraints:

N/A

Value range:

0-1

Default value:

0.8

extract_model_name

No

String

Definition:

Graph extraction model name.

Constraints:

N/A

Value range:

0 to 64 characters.

Default value:

N/A

optimize_model_name

No

String

Definition:

Name of the graph optimization model

Constraints:

N/A

Value range:

The value cannot exceed 64 characters.

Default value:

N/A

graph_search_enable

No

Boolean

Definition:

Whether to enable graph search.

Constraints:

N/A

Value range:

N/A

Default value:

false

graph_reference_count

No

Integer

Definition:

Number of graph search reference documents. This parameter takes effect when graph search is enabled.

Constraints:

N/A

Value range:

1-50

Default value:

10

graph_top_k

No

Integer

Definition:

Top-k for graph vector recall.

Constraints:

N/A

Value range:

1-500

Default value:

50

graph_keyword_top_k

No

Integer

Definition:

Top-k for keyword-based graph search.

Constraints:

N/A

Value range:

1-100

Default value:

20

graph_threshold

No

Float

Definition:

Graph re-ranking threshold.

Constraints:

N/A

Value range:

0-200

Default value:

0.3

Table 11 KnowledgeRepoPromptInfo

Parameter

Mandatory

Type

Description

prompt_id

No

String

Definition:

Prompt ID.

Constraints:

N/A

Value range:

The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed.

Default value:

N/A

qa_question_prompt_id

No

String

Definition:

QA question generation prompt ID.

Constraints:

N/A

Value range:

The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed.

Default value:

N/A

qa_answer_prompt_id

No

String

Definition:

QA answer generation prompt ID

Constraints:

N/A

Value range:

The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed.

Default value:

N/A

mllm_prompt_id

No

String

Definition:

ID of the mllm prompt.

Constraints:

N/A

Value range:

The value can contain only 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed.

Default value:

N/A

table_rag_config

No

String

Definition:

Prompts related to tabular enhancement,

including:

chat_prompt_with_sqlresults_id: Q&A prompts for tabular enhancement

nl2sql_prompt_id: prompt for generating SQL statements

table_rag_prompt_id: tabular Q&A prompt

Constraints:

N/A

Value range:

1 to 512 characters.

Default value:

N/A

Response Parameters

Status code: 200

Table 12 Response body parameters

Parameter

Type

Description

repo_id

String

Definition:

Knowledge base ID.

Value range:

N/A

Status code: 400

Table 13 Response body parameters

Parameter

Type

Description

error_code

String

Definition:

Error Code.

Value range:

N/A

error_msg

String

Definition:

Error message.

Value range:

N/A

Status code: 500

Table 14 Response body parameters

Parameter

Type

Description

error_code

String

Definition:

Error Code.

Value range:

N/A

error_msg

String

Definition:

Error message.

Value range:

N/A

Example Requests

Create a knowledge base.

/v1/1ed40ceefc8d40f8b884edb6a84e7768/applications/fb9731ab-7085-474f-b6c7-64473586f0f3/uni-search/knowledge-repo

{
  "name" : "test_20250425",
  "language_id" : "zh",
  "detail" : "",
  "tags" : [ ],
  "file_extract" : {
    "parse_conf" : {
      "ocr_enabled" : true,
      "image_enabled" : true,
      "header_footer_enabled" : false,
      "catalog_enabled" : false,
      "image_conf" : "TEXT"
    },
    "split_conf" : {
      "split_mode" : "CATALOG",
      "separator_ids" : [ "period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en" ],
      "chunk_size" : 500,
      "merge_titles" : true,
      "title_level" : 3,
      "combine_title" : true
    }
  },
  "extend_config" : {
    "extend_context" : false,
    "effective_input_length" : 3,
    "custom_types" : { },
    "image_match_type" : "context_match",
    "directory_enable" : false,
    "search_engine_name" : "bocha",
    "think_model_name" : "dp-r1"
  },
  "embedding_model" : "embedding-zh",
  "rerank_model" : "rerank-zh",
  "search_plan_model" : "search_ai_plan",
  "pangu_nlp_model" : "dp-r1",
  "cache_enabled" : true,
  "answer_reference_enabled" : false,
  "answer_image_reference_enabled" : false,
  "session_config" : {
    "model_name" : "embedding-zh_faq",
    "similarity_threshold" : 0.9,
    "answer_select_policy" : "first",
    "eviction" : {
      "policy" : "lru",
      "hit_count_threshold" : 1,
      "ttl" : 86400000
    }
  }
}

Example Responses

Status code: 200

Response body for creating a knowledge base.

{
  "repo_id" : "1235abc"
}

Status Codes

Status Code

Description

200

Response body for creating a knowledge base.

400

Incorrect request body parameter.

500

Internal error.

Error Codes

See Error Codes.