Updated on 2025-08-13 GMT+08:00

Obtain documents from OBS.

Function

Document parsing API, which is used to obtain documents from OBS.

URI

POST /v1/koosearch/doc-search/documents

Request Parameters

Table 1 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

Yes

String

Parameter description:

Token used for API authentication. For how to obtain the token, see section 3.2 "Authentication."

Constraints:

N/A.

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

file_path

Yes

String

Parameter description:

Path of the document uploaded to OBS. Only one file can be uploaded at a time.

Constraints:

N/A

Values:

Less than 1024 characters

Default value:

N/A

language

No

String

Description: Language of the document. The options are CHINESE, ENGLISH, ARABIC, and THAI. This parameter can be left blank for Chinese and English documents. Constraints: N/A Value range: CHINESE, ENGLISH, ARABIC, and THAI Default value: N/A

mode

No

Integer

Description: Document parsing and splitting mode. The options are -1 (hierarchical parsing), 2 (rule-based parsing), 3 (length-based parsing), and 4 (automatic parsing). Constraints: The split_mode parameter has a higher priority. Value range: 1, 2, 3, and 4 Default value: N/A

ocr

No

Boolean

Description: Whether to use OCR for parsing. Constraints: The ocr_enabled parameter has a higher priority. Value range: true: OCR is used for parsing. false: OCR is not used for parsing. Default value: false

parse_conf

No

DocParseConfig object

Document parsing configuration

split_conf

No

DocSplitConfig object

Text Splitting Configuration

Table 3 DocParseConfig

Parameter

Mandatory

Type

Description

ocr_enabled

No

Boolean

Whether to use OCR for parsing. Constraint: N/A. Value range: true or false. Default value: false

image_conf

No

String

Parsing mode of images. Constraint: This parameter does not take effect when image_enabled is set to false. Value range: TEXT: extracts image text.IMAGE: retains the original image.BASE64: returns the image data encoded in Base64. Default value: IMAGE

image_enabled

No

Boolean

Whether to parse images. Constraint: N/A. Value range: true or false. Default value: false

header_footer_enabled

No

Boolean

Whether to parse the footer and header. Constraint: N/A. Value range: true or false. Default value: false

catalog_enabled

No

Boolean

Whether to parse the directory. Constraint: N/A. Value range: true or false. Default value: false

Table 4 DocSplitConfig

Parameter

Mandatory

Type

Description

split_mode

No

String

Text splitting mode. Value range: LENGTH: split by text length. CATALOG: split by catalog. RULE: split by defined rules. AUTO: auto splitting. Default value: N/A

separators

No

Array of strings

List of paragraph identifiers. Array of strings. Each string is an identifier. Constraint: The length cannot exceed 50 characters. Value range: N/A. Default value: [" ", ".", "? ", "! ", "!", "?", "\n"]

chunk_size

No

Integer

Maximum length of a chunk. Constraint: N/A. Value range: 1 Default value: N/A

title_level

No

Integer

Maximum depth of a title. Constraint: N/A. Value range: 1. Default value: N/A

combine_title

No

Boolean

Whether to merge titles. Merging format: Title 1 Title 2 Title 3; non-merging format: Title 3. Constraint: N/A. Value range: true or false. Default value: true

rule_regexs

No

Array of strings

Title matching expression in the rule splitting scenario. Array of strings. Each string is an expression. Constraint: The length cannot exceed 10 characters. Value range: N/A. Default value: N/A

merge_titles

No

Boolean

Whether to merge across titles. Constraint: N/A. Value range: true or false. Default value: true

Response Parameters

Status code: 200

Table 5 Response body parameters

Parameter

Type

Description

task_id

String

ID of a document parsing task. You can use this ID to query the document parsing status and result.

Status code: 400

Table 6 Response body parameters

Parameter

Type

Description

error_code

String

Error Code

error_msg

String

Error description

Status code: 401

Table 7 Response body parameters

Parameter

Type

Description

error_code

String

Error Code

error_msg

String

Error description

Status code: 500

Table 8 Response body parameters

Parameter

Type

Description

error_code

String

Error Code

error_msg

String

Error description

Example Requests

None

Example Responses

Status code: 200

Result of creating a file content parsing task

{
  "task_id" : "00c7591f88af4f3fb2f3d7c7191865e6"
}

Status Codes

Status Code

Description

200

Result of creating a file content parsing task

400

Invalid request parameters

401

Authentication exception

500

Internal error

Error Codes

See Error Codes.