Searching for Structured Data Files
Function
Searches for the target file based on the file name.
URI
GET /v1/koosearch/repos/{repo_id}/structured-data/search
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
repo_id |
Yes |
String |
Knowledge base ID. The value is a string of 1 to 64 characters and can contain only digits, letters, hyphens (-), and underscores (_). How to obtain: Log in to the KooSearch experience platform. In the navigation tree on the left, choose Knowledge Bases to view knowledge base IDs. Each knowledge base has a unique ID stored in the vector database. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
file_name |
Yes |
String |
Structured file name. |
file_status |
Yes |
String |
File status. SUCCESS: The upload is successful. ERROR: The upload failed. PENDING: The task is queued. RUNNING: The file is being parsed. IMPORT_EXCEPTION: Import exception FILE_ENCODING_ERROR - Document parsing error |
page_num |
No |
Integer |
Request page number. |
page_size |
No |
Integer |
Response result page size specified by the request, for example, 5 records/page or 10 records/page. |
ids |
No |
Array of strings |
List of file IDs for precise query. |
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
X-Auth-Token |
Yes |
String |
Parameter description: Token used for API authentication. For how to obtain the token, see section 3.2 "Authentication." Constraints: N/A. |
Response Parameters
Status code: 200
Parameter |
Type |
Description |
---|---|---|
total |
Integer |
Total number of returned documents |
page_num |
Integer |
Page number. |
page_size |
Integer |
Number of records on each page. |
files |
Array of FileInfo objects |
File list |
Parameter |
Type |
Description |
---|---|---|
id |
String |
File ID. |
task_id |
String |
Task ID. |
name |
String |
File name. |
repo_id |
String |
Knowledge base ID. |
project_id |
String |
Specifies the project ID. |
application_id |
String |
Application ID. |
status |
String |
File status. SUCCESS [ERROR] Execution failed
|
type |
String |
File type. |
size |
Long |
File size, in bytes. |
category |
String |
Document directory, which corresponds to the leaf nodes in the directory tree. This parameter has only one value. The recommended format is "leaf node directory name (directory ID)", for example, patent (3166-1). |
create_user |
String |
Creator, that is, the user who uploads the file. This key value may not exist. |
create_time |
String |
Creation time, that is, the time when the file is uploaded. Example: 1692848139119 |
update_time |
String |
Update time. This key value may not exist. |
file_path |
String |
File address. This key value may not exist. |
upload_desc |
String |
Upload description. This key value may not exist. |
file_extract_conf |
FileExtractConf object |
File extraction configuration item. |
tags |
Array of strings |
Document tags. You can use tags to automatically group documents for filtering. Notes:
|
fail_count |
Integer |
Number of data records that fail to be uploaded. |
fail_records_expire_time |
String |
Timestamp when the upload fails. |
chat_id |
String |
chatId |
process |
Integer |
Document Parsing Progress |
has_html |
Boolean |
Whether to display in HTML format |
Parameter |
Type |
Description |
---|---|---|
parse_conf |
ParseConf object |
Parameter description: Document parsing configuration, including whether to use OCR enhancement, whether to parse images, whether to extract text during image parsing, whether to parse the header and footer, and whether to parse the contents page. Constraints: N/A. |
split_conf |
SplitConf object |
Parameter limitations Split configuration, including the segmentation mode, level parsing mode, title level depth, title saving mode, segment length, and title matching pattern. Constraints: N/A. |
id |
String |
Parameter description: Document parsing ID. Constraints: N/A. |
Parameter |
Type |
Description |
---|---|---|
ocr_enabled |
Boolean |
Parameter description: Whether the current knowledge base uses OCR enhancement.
Default value: false |
image_enabled |
Boolean |
Parameter description: Whether the current knowledge base needs to parse images. true: Skip images in the document by default. false: Parse images. The parsing mode is configured in image_conf. Constraints: N/A. Default value: false |
header_footer_enabled |
Boolean |
Parameter description: Whether to parse the header and footer of the file in the current knowledge base. true: The parsing result contains the header and footer. false: The parsing result does not contain the header and footer. (If the header and footer do not contain key text information, you are advised to set this parameter to false to avoid interference.) Constraints: N/A Default value: false |
catalog_enabled |
Boolean |
Parameter description: Indicates whether to parse the directory page of the file in the current knowledge base. false: The parsing result does not contain the directory page. (If there is no information that needs to be reserved on the content page, it is recommended that the default value be false.) Generally, a directory page contains a large number of keywords, which may affect the search result.) true: The parsing result contains the directory page. Constraints: N/A. Default value: false |
image_conf |
String |
Parameter description: Image parsing mode when image parsing is enabled (image_enable is set to True).
Default value: TEXT |
Parameter |
Type |
Description |
---|---|---|
split_mode |
String |
Parameter description: Mode for splitting a document. Options: Four modes are available:
Constraints: N/A Default value: AUTO |
separator_ids |
Array of strings |
Parameter description: ID list of segment identifiers in automatic segmentation and length segmentation modes. Segment identifier: determines the end character when a slice is segmented. Options: The specific value mapping is as follows: period_zh: Chinese period. period_en: English period. exclamation_mark_zh: Chinese exclamation mark (!) exclamation_mark_en: English exclamation mark (!) question_mark_zh: Chinese question mark (?) question_mark_en: English question mark (?) comma_zh: Chinese comma (,) comma_en: English comma (,) space_en: space Constraints: N/A. Default value: ["period_zh", "period_en", "exclamation_mark_zh", "exclamation_mark_en", "question_mark_zh", "question_mark_en"] |
rule_regex_id |
String |
Parameter description: ID of the selected user-defined parsing rule. Constraints: N/A. |
chunk_size |
Integer |
Parameter description: Maximum length of a document segment. A document is segmented based on the maximum length. Constraints: N/A. Default value: 500 |
title_level |
Integer |
Parameter description: Depth of the title level reserved for a segment. For example: If the depth is 3, the current paragraph is 1.1.3, and the parent titles 1.1 and 1 are retained. If the depth is 2, the current paragraph is 1.1.3, the parent title 1.1 is retained, and the parent title 1 is discarded. Constraints: N/A. Default value: 3 |
combine_title |
Boolean |
Parameter description: Whether to retain the hierarchical title combination. The options are as follows: false: Only the last-level title is retained. true: Save the combination of multiple levels of titles, from the first level to the last level. For example, 1.1 indicates the usage description, and 1.1.1 indicates how to open the refrigerator. Constraints: N/A. Default value: false |
merge_titles |
Boolean |
Parameter description: Whether to merge titles. The options are as follows: true: If the text in a single paragraph of different titles is small, the paragraphs are automatically merged into the specified segment length to generate more comprehensive results. For example, if the two adjacent sub-paragraphs are less than 200 characters and the expected segment length is 500, the two paragraphs are combined into one paragraph. false: Paragraphs with different titles are not merged. Constraints: N/A. Default value: true |
Status code: 400
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
|
error_msg |
String |
Error description |
Status code: 500
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
|
error_msg |
String |
Error description |
Example Requests
POST https://{endpoint}/v1/koosearch/repos/36b6d979-7f98-4fda-b8b5-d7d0cc95d296/structured-data/search?page_num=1&page_size=1
Example Responses
None
Status Codes
Status Code |
Description |
---|---|
200 |
Document list response body. |
400 |
Incorrect request body parameter |
500 |
Internal error |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot