Create a Document Parsing Task
Function
Document parsing API, which is used to obtain documents from OBS.
URI
POST /v1/{project_id}/applications/{app_id}/doc-search/documents
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
project_id |
Yes |
String |
Definition: Specifies the project ID. For details about how to obtain the project ID, see Obtaining a Project ID. Constraints: N/A Value range: The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter. Default value: N/A |
|
app_id |
Yes |
String |
Definition: Application ID. For details about how to obtain the application ID, see Obtaining an Application ID. Constraints: String Value range: The value can contain 1 to 64 characters. Only digits, letters, hyphens (-), and underscores (_) are allowed. The value must start with a letter. Default value: N/A |
Request Parameters
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
X-Auth-Token |
Yes |
String |
Definition: Token used for API authentication. For details about how to obtain the token, see Obtaining an IAM User Token. Constraints: N/A Value range: N/A Default value: N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
file_path |
Yes |
String |
Definition: Path of the document uploaded to OBS. Only one document can be uploaded each time. Constraints: N/A Value range: N/A Default value: N/A |
|
language |
No |
String |
Definition: Document language. The options are zh (Chinese), en (English), ar (Arabic), th (Thai), pt (Portuguese), and es (Spanish). This parameter is optional for Chinese and English documents. Constraints: N/A Value range: Options:
Default value: N/A |
|
mode |
No |
Integer |
Definition: Document splitting mode. Constraints: N/A Value range: 1-4 Default value: N/A |
|
ocr |
No |
Boolean |
Definition: Whether to use OCR for document parsing. Constraints: N/A Value range:
Default value: N/A |
|
priority |
No |
Integer |
Definition: Task priority. A larger value indicates a higher priority. Constraints: N/A Value range: N/A Default value: N/A |
|
parse_conf |
No |
DocParseConfig object |
Definition: Document parsing configuration. Constraints: N/A Value range: N/A Default value: N/A |
|
split_conf |
No |
DocSplitConfig object |
Definition: Text splitting configuration. Constraints: N/A Value range: N/A Default value: N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
ocr_enabled |
No |
Boolean |
Definition: Whether to use OCR for document parsing. Constraints: N/A Value range:
Default value: N/A |
|
mllm_enabled |
No |
Boolean |
Definition: Whether to use multi-modal parsing. Constraints: N/A Value range:
Default value: N/A |
|
image_conf |
No |
String |
Definition: Image parsing mode. TEXT: Extracts text from the image. IMAGE: Retains the original image. IMAGE_TEXT: Extracts the text and retains the original image. Constraints: N/A Value range: Options:
Default value: TEXT |
|
image_enabled |
No |
Boolean |
Definition: Whether to parse images. Constraints: N/A Value range:
Default value: N/A |
|
header_footer_enabled |
No |
Boolean |
Definition: Whether to parse page headers and footers. Constraints: N/A Value range:
Default value: N/A |
|
catalog_enabled |
No |
Boolean |
Definition: Whether to parse directories. Constraints: N/A Value range:
Default value: N/A |
|
footnote_enabled |
No |
Boolean |
Definition: Whether to parse footnotes. Constraints: N/A Value range:
Default value: N/A |
|
mllm_model |
No |
String |
Definition: Whether to use multi-modal parsing. Multiple multi-modal models can be configured and matched by name. Constraints: N/A Value range: 1 to 32 characters Default value: N/A |
|
mllm_prompt |
No |
Map<String,String> |
Definition: Multi-modal prompt. Constraints: N/A Value range: N/A Default value: N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
split_mode |
No |
String |
Definition: Text splitting mode. Value range:
|
|
separators |
No |
Array of strings |
Definition: Paragraph separator. Value range: N/A |
|
chunk_size |
No |
Integer |
Definition: Maximum chunk length. Value range: N/A |
|
overlap |
No |
Float |
Definition: Chunk overlap ratio. Value range: N/A |
|
title_level |
No |
Integer |
Definition: Maximum title depth. Value range: N/A |
|
combine_title |
No |
Boolean |
Definition: Whether to merge titles. Merged format: title 1 title 2 title 3. Non-merged format: title 3. Value range: N/A |
|
rule_regexs |
No |
Array of strings |
Title matching expression in the rule splitting scenario. |
|
merge_titles |
No |
Boolean |
Whether to merge paragraphs. When enabled, small paragraphs will be merged. |
Response Parameters
Status code: 200
|
Parameter |
Type |
Description |
|---|---|---|
|
task_id |
String |
Definition: ID of a document parsing task. You can use this ID to query the document parsing progress and result. Value range: N/A |
Status code: 400
|
Parameter |
Type |
Description |
|---|---|---|
|
error_code |
String |
Definition: Error code Value range: N/A |
|
error_msg |
String |
Definition: Error description Value range: N/A |
Status code: 401
|
Parameter |
Type |
Description |
|---|---|---|
|
error_code |
String |
Definition: Error code Value range: N/A |
|
error_msg |
String |
Definition: Error description Value range: N/A |
Status code: 500
|
Parameter |
Type |
Description |
|---|---|---|
|
error_code |
String |
Definition: Error code Value range: N/A |
|
error_msg |
String |
Definition: Error description Value range: N/A |
Example Requests
http://100.95.151.220:80/v1/03365e016aa44313b0f55b67cbe4a12a/applications/9b6293e5-2b03-44c3-9184-15b8fd3fae93/doc-search/documents
{
"file_path" : "kos-docs/haier/excel/Trouble Ticket Specifications.xlsx"
}
Example Responses
Status code: 200
File content parsing task creation result
{
"task_id" : "00c7591f88af4f3fb2f3d7c7191865e6"
}
Status Codes
|
Status Code |
Description |
|---|---|
|
200 |
File content parsing task creation result |
|
400 |
Request parameter error |
|
401 |
Authentication exception |
|
500 |
Internal error |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot