Updated on 2025-08-13 GMT+08:00

Querying Documents

Function

Query the file named xx in the current knowledge base.

URI

GET /v1/{project_id}/applications/{application_id}/uni-search/{repo_id}/files/search

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition:

Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Constraints:

N/A

Value range:

The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter.

Default value:

N/A

application_id

Yes

String

Definition:

Application ID. For details about how to obtain the application ID, see Obtaining an Application ID.

Constraints:

Character string

Value range:

The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter.

Default value:

N/A

repo_id

Yes

String

Definition:

Knowledge base ID.

How to obtain:

Log in to the KooSearch experience platform. In the navigation tree on the left, choose Knowledge Bases to view knowledge base IDs. Each knowledge base has a unique ID stored in the vector database.

Constraints:

N/A

Value range:

Length: 1 to 64 characters. The value can contain only digits, letters, hyphens (-), and underscores (_).

Default value:

N/A

Table 2 Query Parameters

Parameter

Mandatory

Type

Description

file_name

No

String

Definition:

File name.

Constraints:

N/A

Value range:

The value is a string of less than 1024 characters. It cannot contain \ / : * ? " < > | <br>.It cannot end with a period (.).

Default value:

N/A

file_type

No

String

Definition:

File type.

Constraints:

N/A

Value range:

Enter 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

Default value:

N/A

category

No

String

Definition:

Document directory, which corresponds to the leaf nodes in the directory tree. This parameter has only one value.

The recommended format is "leaf node directory name (directory ID)", for example, patent (3166-1).

Constraints:

N/A

Value range:

N/A

Default value:

N/A

tags

No

Array of strings

Definition:

Document tag. The intersection of the query is output.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

file_status

No

String

Definition:

File status.

Constraints:

N/A

Value range:

Enter 1 to 128 characters. Only letters, commas (,), hyphens (-), and underscores (_) are allowed.

Default value:

N/A

ids

No

Array of strings

Definition:

IDs of the files to be queried.

Constraints:

N/A

Value range:

The file ID can contain a maximum of 64 characters.

Default value:

N/A

chat_id

No

String

Definition:

Chat ID.

Constraints:

N/A

Value range:

The maximum length is 64 characters.

Default value:

N/A

page_num

No

Integer

Definition:

Page number of the current request, indicating the start page from data retrieval. The default value is 1, indicating to start from the first page.

Constraints:

N/A

Value range:

1-65535

Default value:

1

page_size

No

Integer

Definition:

Number of records displayed on each page, indicating the number of records returned per request. The default value is 10, indicating that 10 records are displayed on each page.

Constraints:

N/A

Value range:

1-65535

Default value:

10

create_user

No

String

Definition:

Creator name

Constraints:

N/A

Value range:

The user name contains 1 to 64 characters.

Default value:

N/A

Request Parameters

Table 3 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

Yes

String

Definition:

Token used for API authentication. For details about how to obtain the token, see Obtaining an IAM User Token.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

Response Parameters

Status code: 200

Table 4 Response body parameters

Parameter

Type

Description

total

Integer

Definition:

Total number of returned documents.

Value range:

N/A

page_num

Integer

Definition:

Page number.

Value range:

N/A

page_size

Integer

Definition:

Number of records per page.

Value range:

N/A

files

Array of FileInfo objects

Definition:

File list.

Value range:

N/A

Table 5 FileInfo

Parameter

Type

Description

id

String

Definition:

File ID.

Value range:

N/A

name

String

Definition:

File name.

Value range:

N/A

repo_id

String

Definition:

Knowledge base ID.

Value range:

N/A

type

String

Definition:

File type.

Value range:

N/A

status

String

Definition:

File status.

Value range:

N/A

  • SUCCESS

  • ERROR: execution failed

  • POST_PROCESSING: The main part of the file processing has completed, but follow-up operations are still required

  • CREATE: Waiting for background processing and about to enter the PENDING state

  • PENDING: initial state, not processed

  • RUNNING: The task is being executed

  • INBOUND: Execution is complete, and data is being loaded in the database.

  • IMPORT_EXCEPTION: Import exception

  • FILE_ENCODING_ERROR: encoding error

chat_id

String

Definition:

Chat ID.

Value range:

N/A

category

String

Definition:

Document directory, which corresponds to the leaf nodes in the directory tree. This parameter has only one value.

The recommended format is "leaf node directory name (directory ID)", for example, patent (3166-1).

Value range:

N/A

tags

Array of strings

Definition:

Document tags. You can use tags to automatically group documents for filtering.

Value range:

N/A

Precautions:

  1. You are advised to use case-insensitive tags. For example, Approved and approved are the same tag.

  2. A document can have more than one tag.

  3. The recommended format is Tag name:Tag value. If you can ensure that tag values do not conflict, you can specify the tag value only. For example:

  • The product model described in Refrigerator User Guide is ProductModel:BCD-551WLCTDAFA5U1.

  • The author of Someone to Talk To is Liu Zhenyun.

  • The professional domain of the Template for Disclosure of Design Patents is legal domain.

size

Long

Definition:

File size, in bytes.

Value range:

N/A

process

Integer

Definition:

File parsing progress.

Value range:

N/A

fail_count

Integer

Definition:

Number of data records that fail to be uploaded.

Value range:

N/A

fail_records_expire_time

String

Definition:

Timestamp when the upload fails.

Value range:

N/A

create_user

String

Definition:

Creator, that is, the user who uploaded the file. This key value may not exist.

Value range:

N/A

create_time

String

Definition:

Creation time or file upload time, for example, 1692848139119.

Value range:

N/A

update_time

String

Definition:

Update time. This key value may not exist.

Value range:

N/A

upload_desc

String

Definition:

Upload description. This key value may not exist.

Value range:

N/A

has_html

Boolean

Definition:

Whether HTML preview is supported.

Value range:

N/A

file_extract_conf

FileExtractConf object

Definition:

File extraction configuration item.

Value range:

N/A

project_id

String

Definition:

Knowledge base ID.

Value range:

N/A

application_id

String

Definition:

Project ID.

Value range:

N/A

file_path

String

Definition:

Document path.

Value range:

N/A

Table 6 FileExtractConf

Parameter

Type

Description

parse_conf

ParseConf object

Definition:

Document parsing configuration, including whether to use OCR enhancement, whether to parse images, whether to extract text during image parsing, whether to parse the header and footer, and whether to parse the contents page.

Value range:

N/A

split_conf

SplitConf object

Definition:

Split configuration, including the segmentation mode, level parsing mode, title level depth, title saving mode, segment length, and title matching pattern.

Value range:

N/A

id

String

Definition:

Document parsing ID.

Constraints:

N/A

Value range:

0 to 128 characters.

Default value:

N/A

Table 7 ParseConf

Parameter

Type

Description

ocr_enabled

Boolean

Definition:

OCR enhancement.

Constraints:

N/A

Value range:

N/A

Default value:

false

mllm_enabled

Boolean

Definition:

Multimodal enhancement.

Constraints:

N/A

Value range:

N/A

Default value:

false

mllm_model

String

Definition:

Multimodal model name.

Constraints:

The mllm_plan model must have already been configured on the platform. You can check the models configured on the platform using the ListModels API.

Value range:

The value can contain 1 to 32 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter or digit.

Default value:

N/A

mllm_prompt

Map<String,String>

Definition:

Prompt of the multimodal model.

Constraints:

A default prompt is provided. You can also configure custom prompts.

Value range:

N/A

Default value:

N/A

image_enabled

Boolean

Definition:

Image parsing.

Constraints:

N/A

Value range:

N/A

Default value:

false

header_footer_enabled

Boolean

Definition:

Parse the header and footer.

Constraints:

N/A

Value range:

N/A

Default value:

false

catalog_enabled

Boolean

Definition:

Parse contents page.

Constraints:

N/A

Value range:

N/A

Default value:

false

image_conf

String

Definition:

Image parsing mode when image_enable is set to True.

Constraints:

When answers need to be returned with images, the IMAGE mode must be used to retain the original images.

Value range:

  • TEXT: extracts text from images.

  • IMAGE: retains the original images.

  • IMAGE_TEXT: parses text and retains the original images.

Default value:

TEXT

footnote_enabled

Boolean

Definition:

Parse footnotes.

Constraints:

N/A

Value range:

N/A

Default value:

false

Table 8 SplitConf

Parameter

Type

Description

split_mode

String

Definition:

Document segmentation mode.

Value range:

The value can be:

  • AUTO: The system automatically identifies the document format and matches the appropriate segmentation and parsing mode.

  • LENGTH: Segments a document by length. For example, a document is segmented into paragraphs every 500 characters.

  • CATALOG: Automatic parsing under hierarchical segmentation. The system automatically identifies the hierarchical structure of an article and segments the article based on the hierarchical structure. For example, section 1.1.2 is a segment, and section 1.1.3 is a segment.

  • RULE: Rule-based parsing under hierarchical segmentation. You can customize the matching rules of hierarchical titles and match and split chapters based on custom rules.

Constraints:

N/A

Default value:

AUTO

separator_ids

Array of strings

Definition:

The chunk ID list in automatic segmentation and length segmentation modes.

Chunk ID: determines the end character for each chunk.

Constraints:

N/A

Value range:

Value mapping:

  • period_zh: "Chinese period ",

  • period_en: "English period .",

  • exclamation_mark_zh: "Chinese exclamation mark ",

  • exclamation_mark_en: "English exclamation mark !",

  • question_mark_zh: "Chinese question mark ",

  • question_mark_en: "English question mark ?",

  • question_mark_ar: "Arabic question mark ؟",

  • comma_zh: "Chinese comma",

  • comma_en: "English comma ,",

  • space_en: "Space"

Default value:

{"period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en"}

rule_regex_id

String

Definition:

User-defined Parsing Rule ID

Constraints:

N/A

Value range:

N/A

Default value:

N/A

chunk_size

Integer

Definition:

Maximum length of a document chunk. The document is segmented based on the maximum chunk length.

Constraints:

N/A

Value range:

0-6000

Default value:

500

title_level

Integer

Definition:

Title hierarchy depth retained in a chunk.

For example:

If the depth is 3 and the current paragraph is 1.1.3, then the parent titles 1.1 and 1 are both retained.

If the depth is 2 and the current paragraph is 1.1.3, then the parent title 1.1 is retained, and the parent title 1 is discarded.

Constraints:

N/A

Value range:

1-10

Default value:

3

combine_title

Boolean

Definition:

Whether to retain the hierarchical title combination.

Constraints:

N/A

Value range:

N/A

Default value:

false

merge_titles

Boolean

Definition:

Cross-Title Merge: When text in paragraphs with different titles is limited, it is automatically merged up to a specified section length, aiding in the creation of a more comprehensive outcome.

Constraints:

N/A

Value range:

N/A

Default value:

false

rule_regexs

Array of strings

Definition:

User-defined parsing rules.

Constraints:

N/A

Value range:

The list length ranges from 1 to 100.

Default value:

N/A

merge_last_chunk

Boolean

Definition:

Whether to merge the most recent modified segments.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

Status code: 400

Table 9 Response body parameters

Parameter

Type

Description

error_code

String

Definition:

Error Code.

Value range:

N/A

error_msg

String

Definition:

Error message.

Value range:

N/A

Status code: 500

Table 10 Response body parameters

Parameter

Type

Description

error_code

String

Definition:

Error Code.

Value range:

N/A

error_msg

String

Definition:

Error message.

Value range:

N/A

Example Requests

Query the file named xx in the current knowledge base.

/v1/b25446daeb1a41a7953c5deba2b2677a/applications/cefb2a59-2f9e-4268-b56b-eab15dc0b9d6/uni-search/0e7a261c-a543-4f98-915d-7a83ac96595c/files/search?file_name=%E8%A7%86%E9%A2%91%E7%9B%B4%E6%92%AD%20Live%20%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.pdf&page_size=100&page_num=1

Example Responses

Status code: 200

Document list response body.

{
  "total" : 1,
  "files" : [ {
    "id" : "45938a076274b530fa0c600af354ea4b",
    "name" : "Live stream best practices.pdf",
    "type" : "pdf",
    "status" : "SUCCESS",
    "size" : 396838,
    "process" : 100,
    "create_user" : "ei_css_011",
    "create_time" : "1745827605498",
    "has_html" : false,
    "file_extract_conf" : {
      "id" : "ff4b453f-5126-4768-a8ab-efbbc71e9a04",
      "parse_conf" : {
        "ocr_enabled" : true,
        "image_enabled" : true,
        "image_conf" : "TEXT",
        "header_footer_enabled" : false,
        "catalog_enabled" : false
      },
      "split_conf" : {
        "split_mode" : "AUTO"
      }
    }
  } ],
  "page_num" : 1,
  "page_size" : 100
}

Status Codes

Status Code

Description

200

Document list response body.

400

Incorrect request body parameter.

500

Internal error.

Error Codes

See Error Codes.