Updated on 2025-08-13 GMT+08:00

Uploading a document from the local host

Function

Document parsing API, which is used to upload documents locally.

URI

POST /v1/koosearch/doc-search/files

Request Parameters

Table 1 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

Yes

String

Parameter description:

Token used for API authentication. For how to obtain the token, see section 3.2 "Authentication."

Constraints:

N/A.

Table 2 FormData parameters

Parameter

Mandatory

Type

Description

file

Yes

File

Parameter description:

Document to be uploaded for parsing.

Constraints:

N/A

Values:

File

Default value:

N/A

language

No

String

Parameter description:

Language of the document. The options are CHINESE, ENGLISH, ARABIC, and THAI. This parameter can be left empty for Chinese and English documents.

Constraints:

N/A

Values:

CHINESE, ENGLISH, ARABIC, THAI

Default value:

N/A

mode

No

Integer

Parameter description:

Document parsing and splitting mode. The value can be 1 (hierarchical parsing), 2 (rule-based parsing), 3 (length-based parsing), or 4 (automatic parsing).

Constraints:

The priority of split_mode is higher.

Values:

1, 2, 3, 4

Default value:

N/A

ocr

No

Boolean

Parameter description:

Whether to use OCR for parsing

Constraints:

The priority of ocr_enabled is higher.

Values:

true: OCR is used for parsing.

false: OCR is not used for parsing.

Default value:

false

ocr_enabled

No

Boolean

Parameter description:

Whether to use OCR for parsing

Constraints:

N/A

Values:

true: OCR is used for parsing.

false: OCR is not used for parsing.

Default value:

false

image_enabled

No

Boolean

Parameter description:

Whether to parse images.

Constraints:

N/A

Values:

true: Parse images.

false: Do not parse images.

Default value:

false

image_conf

No

String

Parameter description:

Image parsing mode.

Constraints:

This parameter does not take effect when image_enabled is set to false.

Values:

TEXT: Extracts image text. IMAGE: Retains the original image. BASE64: Returns the image data encoded using Base64.

Default value:

IMAGE

header_footer_enabled

No

Boolean

Parameter description:

Whether to parse the footer and header.

Constraints:

N/A

Values:

true: Parse the footer and header.

false: Do not parse the footer and header.

Default value:

false

catalog_enabled

No

Boolean

Parameter description:

Whether to parse the directory

Constraints:

N/A

Values:

true: Parse the directory.

false: Do not parse the directory.

Default value:

false

separators

No

Array of strings

Parameter description:

List set of paragraph identifiers. The value is an array of strings. Each string is an identifier.

Constraints:

50 character limit.

Values:

N/A

Default value:

[". ", ".", "? ", "! ", "!", "?", "\n"]

rule_regexs

No

Array of strings

Parameter description:

Title matching expression in the rule-based splitting scenario. The value is an array of strings. Each string is an expression.

Constraints:

The length cannot exceed 10 characters.

Values:

N/A

Default value:

N/A

split_mode

No

String

Parameter description:

Text splitting mode. The options are as follows: LENGTH (split by text length), CATALOG (split by contents), RULE (split by defined rules), and AUTO (automatic splitting).

Constraints:

N/A

Values:

LENGTH, CATALOG, RULE, AUTO

Default value:

N/A

chunk_size

No

Integer

Parameter description:

Maximum length of a chunk

Constraints:

N/A

Values:

1-

Default value:

N/A

title_level

No

Integer

Parameter description:

Maximum depth of the title

Constraints:

N/A

Values:

1-

Default value:

N/A

combine_title

No

Boolean

Parameter description:

Whether to merge titles. The format for merging titles is Title 1 Title 2 Title 3. The format for not merging titles is Title 3.

Constraints:

N/A

Values:

true: Merge titles.

false: The title is not merged.

Default value:

true

merge_titles

No

Boolean

Parameter description:

Whether to merge different titles.

Constraints:

N/A

Values:

true: Merge different titles.

false: Do not merge different titles.

Default value:

true

reference_enabled

No

Boolean

Parameter description:

Whether to parse references

Constraints:

N/A

Values:

true: Parse references.

false: Do not parse references.

Default value:

false

Response Parameters

Status code: 200

Table 3 Response body parameters

Parameter

Type

Description

task_id

String

ID of a document parsing task. You can use this ID to query the document parsing status and result.

Status code: 400

Table 4 Response body parameters

Parameter

Type

Description

error_code

String

Error Code

error_msg

String

Error description

Status code: 401

Table 5 Response body parameters

Parameter

Type

Description

error_code

String

Error Code

error_msg

String

Error description

Status code: 500

Table 6 Response body parameters

Parameter

Type

Description

error_code

String

Error Code

error_msg

String

Error description

Example Requests

None

Example Responses

Status code: 200

File content parsing task creation result

{
  "task_id" : "00c7591f88af4f3fb2f3d7c7191865e6"
}

Status Codes

Status Code

Description

200

File content parsing task creation result

400

Invalid request parameters

401

Authentication exception

500

Internal error

Error Codes

See Error Codes.