Obtain documents from OBS.
Function
Document parsing API, which is used to obtain documents from OBS.
URI
POST /v1/koosearch/doc-search/documents
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
X-Auth-Token |
Yes |
String |
Parameter description: Token used for API authentication. For how to obtain the token, see section 3.2 "Authentication." Constraints: N/A. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
file_path |
Yes |
String |
Parameter description: Path of the document uploaded to OBS. Only one file can be uploaded at a time. Constraints: N/A Values: Less than 1024 characters Default value: N/A |
language |
No |
String |
Description: Language of the document. The options are CHINESE, ENGLISH, ARABIC, and THAI. This parameter can be left blank for Chinese and English documents. Constraints: N/A Value range: CHINESE, ENGLISH, ARABIC, and THAI Default value: N/A |
mode |
No |
Integer |
Description: Document parsing and splitting mode. The options are -1 (hierarchical parsing), 2 (rule-based parsing), 3 (length-based parsing), and 4 (automatic parsing). Constraints: The split_mode parameter has a higher priority. Value range: 1, 2, 3, and 4 Default value: N/A |
ocr |
No |
Boolean |
Description: Whether to use OCR for parsing. Constraints: The ocr_enabled parameter has a higher priority. Value range: true: OCR is used for parsing. false: OCR is not used for parsing. Default value: false |
parse_conf |
No |
DocParseConfig object |
Document parsing configuration |
split_conf |
No |
DocSplitConfig object |
Text Splitting Configuration |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
ocr_enabled |
No |
Boolean |
Whether to use OCR for parsing. Constraint: N/A. Value range: true or false. Default value: false |
image_conf |
No |
String |
Parsing mode of images. Constraint: This parameter does not take effect when image_enabled is set to false. Value range: TEXT: extracts image text.IMAGE: retains the original image.BASE64: returns the image data encoded in Base64. Default value: IMAGE |
image_enabled |
No |
Boolean |
Whether to parse images. Constraint: N/A. Value range: true or false. Default value: false |
header_footer_enabled |
No |
Boolean |
Whether to parse the footer and header. Constraint: N/A. Value range: true or false. Default value: false |
catalog_enabled |
No |
Boolean |
Whether to parse the directory. Constraint: N/A. Value range: true or false. Default value: false |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
split_mode |
No |
String |
Text splitting mode. Value range: LENGTH: split by text length. CATALOG: split by catalog. RULE: split by defined rules. AUTO: auto splitting. Default value: N/A |
separators |
No |
Array of strings |
List of paragraph identifiers. Array of strings. Each string is an identifier. Constraint: The length cannot exceed 50 characters. Value range: N/A. Default value: [" ", ".", "? ", "! ", "!", "?", "\n"] |
chunk_size |
No |
Integer |
Maximum length of a chunk. Constraint: N/A. Value range: 1 Default value: N/A |
title_level |
No |
Integer |
Maximum depth of a title. Constraint: N/A. Value range: 1. Default value: N/A |
combine_title |
No |
Boolean |
Whether to merge titles. Merging format: Title 1 Title 2 Title 3; non-merging format: Title 3. Constraint: N/A. Value range: true or false. Default value: true |
rule_regexs |
No |
Array of strings |
Title matching expression in the rule splitting scenario. Array of strings. Each string is an expression. Constraint: The length cannot exceed 10 characters. Value range: N/A. Default value: N/A |
merge_titles |
No |
Boolean |
Whether to merge across titles. Constraint: N/A. Value range: true or false. Default value: true |
Response Parameters
Status code: 200
Parameter |
Type |
Description |
---|---|---|
task_id |
String |
ID of a document parsing task. You can use this ID to query the document parsing status and result. |
Status code: 400
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
|
error_msg |
String |
Error description |
Status code: 401
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
|
error_msg |
String |
Error description |
Status code: 500
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
|
error_msg |
String |
Error description |
Example Requests
None
Example Responses
Status code: 200
Result of creating a file content parsing task
{ "task_id" : "00c7591f88af4f3fb2f3d7c7191865e6" }
Status Codes
Status Code |
Description |
---|---|
200 |
Result of creating a file content parsing task |
400 |
Invalid request parameters |
401 |
Authentication exception |
500 |
Internal error |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot