Updated on 2025-12-02 GMT+08:00

Updating Document Parsing

Function

Document parsing API, which is used to upload documents locally.

URI

POST /v1/{project_id}/applications/{app_id}/doc-search/files

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition:

Specifies the project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Constraints:

N/A

Value range:

The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter.

Default value:

N/A

app_id

Yes

String

Definition:

Application ID. For details about how to obtain the application ID, see Obtaining an Application ID.

Constraints:

String

Value range:

The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter.

Default value:

N/A

Request Parameters

Table 2 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

Yes

String

Definition:

Token used for API authentication. For details about how to obtain the token, see Obtaining an IAM User Token.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

Table 3 FormData parameters

Parameter

Mandatory

Type

Description

file

Yes

File

Definition:

Document to be uploaded and parsed.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

language

No

String

Definition:

Document language. The options are zh (Chinese), en (English), ar (Arabic), th (Thai), pt (Portuguese), and es (Spanish). This parameter is optional for Chinese and English documents.

Constraints:

N/A

Value range:

  • zh: Chinese

  • en: English

  • th: Thai

  • es: Spanish

ar: Arabic

pt: Portuguese

Default value:

N/A

mode

No

Integer

Definition:

Splitting mode.

Constraints:

N/A

Value range:

  • 1: directory parsing

  • 2: rule parsing

  • 3: length parsing

  • 4: automatic parsing

Default value:

N/A

ocr

No

Boolean

Definition:

Whether to use OCR for document parsing.

Constraints:

N/A

Value range:

  • true: Use OCR parsing.

  • false: Do not use OCR parsing.

Default value:

N/A

priority

No

Integer

Definition:

Job priority. A larger value indicates a higher priority. The default value is 0.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

ocr_enabled

No

Boolean

Definition:

Whether to use OCR for document parsing.

Constraints:

N/A

Value range:

  • true: Use OCR parsing.

  • false: Do not use OCR parsing.

Default value:

N/A

mllm_enabled

No

Boolean

Definition:

Whether to use multi-modal parsing.

Constraints:

N/A

Value range:

  • true: Use multi-modal parsing.

  • false: Do not use multi-modal parsing.

Default value:

N/A

image_enabled

No

Boolean

Definition:

Whether to parse images.

Constraints:

N/A

Value range:

  • true: Parse images.

  • false: Do not parse images.

Default value:

N/A

image_conf

No

String

Definition:

Image parsing method.

Constraints:

N/A

Value range:

Enumerated value

  • TEXT: Extract text from the image.

  • IMAGE: Retain the original image.

  • BASE64: Return image data encoded using Base64.

Default value:

N/A

header_footer_enabled

No

Boolean

Definition:

Whether to parse footers and headers.

Constraints:

N/A

Value range:

  • true: Parse footers and headers.

  • false: Do not parse footers and headers.

Default value:

N/A

catalog_enabled

No

Boolean

Definition:

Whether to parse Contents.

Constraints:

N/A

Value range:

  • true: Parse Contents.

  • false: Not to parse Contents.

Default value:

N/A

separators

No

Array of strings

Definition:

Paragraph ID, which is used to split sentences.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

rule_regexs

No

Array of strings

Definition:

Title matching expression in the rule splitting scenario.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

split_mode

No

String

Definition:

Document splitting mode.

Constraints:

N/A

Value range:

Enumerated value

  • LENGTH: Split by word count.

  • CATALOG: Split by Contents.

  • RULE: Split by rule.

  • AUTO: Automatically select the splitting mode.

Default value:

N/A

chunk_size

No

Integer

Definition:

Maximum chunk length.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

title_level

No

Integer

Definition:

Maximum title depth.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

combine_title

No

Boolean

Definition:

Whether to merge titles. Merged format: title 1 title 2 title 3. Non-merged format: title 3.

Constraints:

N/A

Value range:

  • true: Merge titles.

  • false: Do not merge titles.

Default value:

N/A

merge_titles

No

Boolean

Definition:

Whether to merge across titles.

Constraints:

N/A

Value range:

  • true: Merge across titles.

  • false: Do not merge across titles.

Default value:

N/A

overlap

No

Float

Definition:

Chunk overlap ratio.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

reference_enabled

No

Boolean

Definition:

Whether to parse reference documents.

Constraints:

N/A

Value range:

  • true: Parse reference documents.

  • false: Do not parse reference documents.

Default value:

N/A

footnote_enabled

No

Boolean

Definition:

Whether to parse footnotes.

Constraints:

N/A

Value range:

  • true: Parse footnotes.

  • false: Do not parse footnotes.

Default value:

N/A

mllm_model

No

String

Definition:

Whether to use multi-modal parsing. Multiple multi-modal models can be configured and matched by name.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

mllm_prompt

No

String

Definition:

Multimodal prompt, which is of the map type. Example: {"en":"Please parse this image"}.

Constraints:

N/A

Value range:

N/A

Default value:

N/A

Response Parameters

Status code: 200

Table 4 Response body parameters

Parameter

Type

Description

task_id

String

Definition:

ID of a document parsing task. You can use this ID to query the document parsing progress and result.

Value range:

N/A

Status code: 400

Table 5 Response body parameters

Parameter

Type

Description

error_code

String

Definition:

Error code

Value range:

N/A

error_msg

String

Definition:

Error description

Value range:

N/A

Status code: 401

Table 6 Response body parameters

Parameter

Type

Description

error_code

String

Definition:

Error code

Value range:

N/A

error_msg

String

Definition:

Error description

Value range:

N/A

Status code: 500

Table 7 Response body parameters

Parameter

Type

Description

error_code

String

Definition:

Error code

Value range:

N/A

error_msg

String

Definition:

Error description

Value range:

N/A

Example Requests

http://100.85.216.4:31628/v1/ee51ecd9-bc3c-4e98-b7df-ba6647350af2/applications/01d3c218-4d37-489a-98ff-69d69ea44bb1/doc-search/files

{
  "file" : "/D:/Documents/Identifying existing management practices in the control of Striga asiatica within rice–maize systems in mid-west Madagascar.pdf",
  "mode" : "4"
}

Example Responses

Status code: 200

File content parsing task creation result

{
  "task_id" : "00c7591f88af4f3fb2f3d7c7191865e6"
}

Status Codes

Status Code

Description

200

File content parsing task creation result

400

Request parameter error

401

Authentication exception

500

Internal error

Error Codes

See Error Codes.