Searching for Structured Data Files

Function

Searches for the target file based on the file name.

URI

GET /v1/koosearch/repos/{repo_id}/structured-data/search

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
repo_id	Yes	String	Knowledge base ID. The value is a string of 1 to 64 characters and can contain only digits, letters, hyphens (-), and underscores (_). How to obtain: Log in to the KooSearch experience platform. In the navigation tree on the left, choose Knowledge Bases to view knowledge base IDs. Each knowledge base has a unique ID stored in the vector database.

**Table 2** Query Parameters
Parameter	Mandatory	Type	Description
file_name	Yes	String	Structured file name.
file_status	Yes	String	File status. SUCCESS: The upload is successful. ERROR: The upload failed. PENDING: The task is queued. RUNNING: The file is being parsed. IMPORT_EXCEPTION: Import exception FILE_ENCODING_ERROR - Document parsing error
page_num	No	Integer	Request page number.
page_size	No	Integer	Response result page size specified by the request, for example, 5 records/page or 10 records/page.
ids	No	Array of strings	List of file IDs for precise query.

Request Parameters

**Table 3** Request header parameters
Parameter	Mandatory	Type	Description
X-Auth-Token	Yes	String	Parameter description: Token used for API authentication. For how to obtain the token, see section 3.2 "Authentication." Constraints: N/A.

Response Parameters

Status code: 200

**Table 4** Response body parameters
Parameter	Type	Description
total	Integer	Total number of returned documents
page_num	Integer	Page number.
page_size	Integer	Number of records on each page.
files	Array of FileInfo objects	File list

**Table 5** FileInfo
Parameter	Type	Description
id	String	File ID.
task_id	String	Task ID.
name	String	File name.
repo_id	String	Knowledge base ID.
project_id	String	Specifies the project ID.
application_id	String	Application ID.
status	String	File status. SUCCESS [ERROR] Execution failed PENDING: initial state, not processed RUNNING: The task is being executed. IMPORT_EXCEPTION: Import exception FILE_ENCODING_ERROR: encoding error
type	String	File type.
size	Long	File size, in bytes.
category	String	Document directory, which corresponds to the leaf nodes in the directory tree. This parameter has only one value. The recommended format is "leaf node directory name (directory ID)", for example, patent (3166-1).
create_user	String	Creator, that is, the user who uploads the file. This key value may not exist.
create_time	String	Creation time, that is, the time when the file is uploaded. Example: 1692848139119
update_time	String	Update time. This key value may not exist.
file_path	String	File address. This key value may not exist.
upload_desc	String	Upload description. This key value may not exist.
file_extract_conf	FileExtractConf object	File extraction configuration item.
tags	Array of strings	Document tags. You can use tags to automatically group documents for filtering. Notes: It is recommended that the tag name be case insensitive. For example, Approved and approved are the same tag. A document can have more than one tag. The recommended format is Tag name: Tag value. If tag values do not conflict, you can directly use the tag values. For example: The product model described in Refrigerator User Guide is ProductModel:BCD-551WLCTDAFA5U1. The author of Someone to Talk To is Liu Zhenyun. The professional domain of the Template for Disclosure of Design Patents is legal domain.
fail_count	Integer	Number of data records that fail to be uploaded.
fail_records_expire_time	String	Timestamp when the upload fails.
chat_id	String	chatId
process	Integer	Document Parsing Progress
has_html	Boolean	Whether to display in HTML format

**Table 6** FileExtractConf
Parameter	Type	Description
parse_conf	ParseConf object	Parameter description: Document parsing configuration, including whether to use OCR enhancement, whether to parse images, whether to extract text during image parsing, whether to parse the header and footer, and whether to parse the contents page. Constraints: N/A.
split_conf	SplitConf object	Parameter limitations Split configuration, including the segmentation mode, level parsing mode, title level depth, title saving mode, segment length, and title matching pattern. Constraints: N/A.
id	String	Parameter description: Document parsing ID. Constraints: N/A.

**Table 7** ParseConf
Parameter	Type	Description
ocr_enabled	Boolean	Parameter description: Whether the current knowledge base uses OCR enhancement. Pure Word documents do not need to be parsed using OCR. PDF and PPTX files require OCR for intelligent document recognition, such as table parsing and text extraction. Constraints: N/A. Default value: false
image_enabled	Boolean	Parameter description: Whether the current knowledge base needs to parse images. true: Skip images in the document by default. false: Parse images. The parsing mode is configured in image_conf. Constraints: N/A. Default value: false
header_footer_enabled	Boolean	Parameter description: Whether to parse the header and footer of the file in the current knowledge base. true: The parsing result contains the header and footer. false: The parsing result does not contain the header and footer. (If the header and footer do not contain key text information, you are advised to set this parameter to false to avoid interference.) Constraints: N/A Default value: false
catalog_enabled	Boolean	Parameter description: Indicates whether to parse the directory page of the file in the current knowledge base. false: The parsing result does not contain the directory page. (If there is no information that needs to be reserved on the content page, it is recommended that the default value be false.) Generally, a directory page contains a large number of keywords, which may affect the search result.) true: The parsing result contains the directory page. Constraints: N/A. Default value: false
image_conf	String	Parameter description: Image parsing mode when image parsing is enabled (image_enable is set to True). TEXT: Extracts text from an image and does not retain the image. IMAGE: The original image is retained. Constraints: If you want to return an answer with text and images, you must use the IMAGE mode and retain the original image. Default value: TEXT

**Table 8** SplitConf
Parameter	Type	Description
split_mode	String	Parameter description: Mode for splitting a document. Options: Four modes are available: AUTO: The system automatically identifies the document format and matches the appropriate splitting and parsing mode. LENGTH: Split by length. For example, each 500 characters are split into a paragraph. CATALOG: Automatic parsing in hierarchical segmentation. The system automatically identifies the hierarchical structure of an article and segments the article based on the hierarchical structure. For example, section 1.1.2 is a segment, and section 1.1.3 is a segment. RULE: Rule-based parsing in hierarchical segmentation. You can customize the matching rules of hierarchical titles and match and split chapters based on the customized rules. Constraints: N/A Default value: AUTO
separator_ids	Array of strings	Parameter description: ID list of segment identifiers in automatic segmentation and length segmentation modes. Segment identifier: determines the end character when a slice is segmented. Options: The specific value mapping is as follows: period_zh: Chinese period. period_en: English period. exclamation_mark_zh: Chinese exclamation mark (!) exclamation_mark_en: English exclamation mark (!) question_mark_zh: Chinese question mark (?) question_mark_en: English question mark (?) comma_zh: Chinese comma (,) comma_en: English comma (,) space_en: space Constraints: N/A. Default value: ["period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en"]
rule_regex_id	String	Parameter description: ID of the selected user-defined parsing rule. Constraints: N/A.
chunk_size	Integer	Parameter description: Maximum length of a document segment. A document is segmented based on the maximum length. Constraints: N/A. Default value: 500
title_level	Integer	Parameter description: Depth of the title level reserved for a segment. For example: If the depth is 3, the current paragraph is 1.1.3, and the parent titles 1.1 and 1 are retained. If the depth is 2, the current paragraph is 1.1.3, the parent title 1.1 is retained, and the parent title 1 is discarded. Constraints: N/A. Default value: 3
combine_title	Boolean	Parameter description: Whether to retain the hierarchical title combination. The options are as follows: false: Only the last-level title is retained. true: Save the combination of multiple levels of titles, from the first level to the last level. For example, 1.1 indicates the usage description, and 1.1.1 indicates how to open the refrigerator. Constraints: N/A. Default value: false
merge_titles	Boolean	Parameter description: Whether to merge titles. The options are as follows: true: If the text in a single paragraph of different titles is small, the paragraphs are automatically merged into the specified segment length to generate more comprehensive results. For example, if the two adjacent sub-paragraphs are less than 200 characters and the expected segment length is 500, the two paragraphs are combined into one paragraph. false: Paragraphs with different titles are not merged. Constraints: N/A. Default value: true

Status code: 400

**Table 9** Response body parameters
Parameter	Type	Description
error_code	String	Error Code
error_msg	String	Error description

Status code: 500

**Table 10** Response body parameters
Parameter	Type	Description
error_code	String	Error Code
error_msg	String	Error description

Example Requests

POST https://{endpoint}/v1/koosearch/repos/36b6d979-7f98-4fda-b8b5-d7d0cc95d296/structured-data/search?page_num=1&page_size=1

Example Responses

None

Status Codes

Status Code	Description
200	Document list response body.
400	Incorrect request body parameter
500	Internal error

Error Codes

See Error Codes.

Parent topic: Structured data

Previous topic: Upload: structured data

Next topic: File management

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot