General Table OCR

Function

General Table OCR recognizes the text in a table and returns the recognition result in JSON format. The returned result includes two types of image area (words_region): text area (text) and table area (table). It also includes the table structure (row, column) and text information. For details about the constraints on using this API, see Constraints. For details about how to use this API, see Introduction to OCR.

Figure 1 General Table OCR example

Prerequisites

Before using General Table OCR, you need to apply for the service and complete authentication. For details, see Subscribing to OCR and Authentication.

URI

POST https://{endpoint}/v2/{project_id}/ocr/general-table

Table 1 Path parameters

Parameter

Mandatory

Description

endpoint

Yes

Domain name or IP address of the server bearing the REST service endpoint. The endpoint varies depending on services in different regions. For more details, see Endpoints.

For example, the endpoint of OCR in the CN North-Beijing4 region is ocr.cn-north-4.myhuaweicloud.com.

project_id

Yes

Project ID, which can be obtained from Obtaining a Project ID.

Request Parameters

Table 2 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

Yes

String

User token

During API authentication using a token, the token is added to requests to obtain permissions for calling the API. The value of X-Subject-Token in the response header is the obtained token.

Content-Type

Yes

String

MIME type of the request body. The value is application/json.

Table 3 Request body parameters

Parameter

Mandatory

Type

Description

image

No. Set either this parameter or url.

String

Base64 character string converted from the image. The size cannot exceed 10 MB. The narrow edge contains at least 15 pixels and the wide edge contains at most 8,192 pixels. The JPEG, JPG, PNG, BMP, and TIFF formats are supported.

url

No. Set either this parameter or image.

String

Image URL. Currently, the following URLs are supported:

  • Public network: HTTP/HTTPS URL
  • URL provided by OBS. You need to be authorized to use OBS data, including service authorization, temporary authorization, and anonymous public authorization. For details, see Configuring Access Permissions of OBS.
NOTE:
  • The API response time depends on the image download time. If the image download takes a long time, the API call will fail.
  • Ensure that the storage service where the images to be detected reside is stable and reliable. OBS is recommended for storing image data.

return_text_location

No

Boolean

Whether to return coordinates of text blocks and cells. Possible values are as follows:

  • true: Return coordinates of text blocks and cells.
  • false: Do not return coordinates of text blocks and cells.

If this parameter is not specified, the default value false is used.

return_confidence

No

Boolean

Whether to return the confidence. Possible values are as follows:

  • true: Return the confidence.
  • false: Do not return the confidence.

If this parameter is not specified, the default value false is used.

return_excel

No

Boolean

Whether to return the Base64-encoded field for converting a table into a Microsoft Excel file. Possible values are as follows:

  • true: The excel field is returned, indicating the Base64 code of the recognition result in an XLSX table.
  • false: The Base64-encoded field is not returned. Default value: false

You can use the Python function base64.b64decode to decode the returned Excel code and save it as an .xlsx file.

Response Parameters

Response parameters and status codes vary in different recognition results. They are described as below.

Status code: 200

Table 4 Response body parameter

Parameter

Type

Description

result

GeneralTableResult object

Calling result of a successful API call

This parameter is not included when the API fails to be called.

Table 5 GeneralTableResult

Parameter

Type

Description

words_region_count

Integer

Number of text areas

words_region_list

Array of WordsRegionList objects

List of recognition results in text areas. The output sequence is from left to right and from top to bottom.

Table 6 WordsRegionList

Parameter

Type

Description

type

String

Type of the text identification area. Possible values are as follows:

  • text: text recognition area
  • table: table recognition area

words_block_count

String

Number of text blocks recognized in a sub-area

words_block_list

Array of GeneralTableWordsBlockList objects

List of text blocks recognized in a sub-area. The output sequence is from left to right and from top to bottom.

Table 7 GeneralTableWordsBlockList

Parameter

Type

Description

words

String

Recognition result of the text. When the input parameter return_text_location is set to false, a text value is returned for each cell. Texts in different lines are combined using the newline characters.

words_list

Array of objects

List of the character blocks in a cell. The output sequence is from left to right and from top to bottom. This parameter is available only when the input parameter return_text_location is set to true.

rows

Array of integers

Rows occupied by text. The values start from 0 and are displayed in a list. The data type is Integer. This parameter is valid only in table recognition areas, that is, this parameter is valid only when type is table. Multiple consecutive values indicate that the parameter value involves multiple rows (the rows and columns are divided based on the minimum cells in the table). For example, rows: [0, 1, 2] indicates that the parameter value involves three rows.

columns

Array of integers

Columns occupied by text. The values start from 0 and are displayed in a list. The data type is Integer. This parameter is valid only in table recognition areas, that is, this parameter is valid only when type is table. Multiple consecutive values indicate that the parameter value involves multiple columns (the rows and columns are divided based on the minimum cells in the table). For example, cols: [0, 1, 2] indicates that the parameter value involves three columns.

location

Array of objects

Text block location information, in list format, indicating the X and Y coordinates of the four vertices in a text block. The coordinate origin is the upper left corner of the image, the X axis is horizontal, and the Y axis is vertical.

cell_location

Array of objects

Cell position information, in list format, indicating the X and Y coordinates of the four vertices in a cell. The coordinate origin is the upper left corner of the image, the X axis is horizontal, and the Y axis is vertical.

excel

String

The table image is converted into the Base64 code of the Excel file. The text and table in the image are written into the Excel file by position. Decode the returned code using base64.b64decode and save it as an .xlsx file.

confidence

Number

Confidence information of a related field. The value ranges from 0 to 1.

A higher confidence level indicates a higher reliability and accuracy of the corresponding field identified.

The confidence is not equal to the accuracy, and is calculated through related algorithms.

Status code: 400

Table 8 Response body parameter

Parameter

Type

Description

error_code

String

Error code of a failed API call. For details, see Error Codes.

If error code ModelArts.4204 is displayed, refer to Why Is a Message Stating "ModelArts.4204" Displayed When the OCR API Is Called?

This parameter is not included when the API is successfully called.

error_msg

String

Error message returned when the API fails to be called

This parameter is not included when the API is successfully called.

Request Example

  • The endpoint is the request URL for calling an API. Endpoints vary depending on services and regions. For details, see Endpoints.

    For example, General Table OCR is deployed in the CN North-Beijing4 region. The endpoint is ocr.cn-north-4.myhuaweicloud.com. The request URL is https://ocr.cn-north-4.myhuaweicloud.com/v2/{project_id}/ocr/general-table. project_id is the project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

  • For details about how to obtain a token, see Making an API Request.
  • Request example (Method 1: Use the image Base64 string.)
    POST https://{endpoint}/v2/{project_id}/ocr/general-table
     Request Header:   
     Content-Type: application/json   
     X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...      
     Request Body:
     {   
        "image":"/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAAg...",
        "return_text_location": true,
        "return_excel": true,
        "return_confidence":true
      }
  • Request example (Method 2: Use the image URL.)
    POST https://{endpoint}/v2/{project_id}/ocr/general-table
     Request Header:   
     Content-Type: application/json   
     X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...      
     Request Body:
     {
         "url":"https://BucketName.obs.xxxx.com/ObjectName",
         "return_confidence":false
      }
  • Sample code for a Python 3 request (For codes in other languages, refer to the following sample or use OCR SDK.)
    # encoding:utf-8
    
    import requests
    import base64
    
    url = "https://{endpoint}/v2/{project_id}/ocr/general-table"
    token = "Actual token value obtained by the user"
    headers = {'Content-Type': 'application/json', 'X-Auth-Token': token}
    
    imagepath = r'./data/general-table-demo.png'
    with open(imagepath, "rb") as bin_data:
        image_data = bin_data.read()
    image_base64 = base64.b64encode(image_data).decode("utf-8")  # Base64 encoding of images.
    payload = {"image": image_base64}  # url or image.
    
    response = requests.post(url, headers=headers, json=payload)
    print(response.text)

Example Response

Successful response example

When the input parameter return_text_location is set to false, the coordinate information is not returned.
{
    "result": {
        "words_region_count": 2,
        "words_region_list": [
            {
               "type": "text",
               "words_block_count": 1,
               "words_block_list": [  
                    {
                       "words": "Words recognized from the text recognition area",
                       "confidence": 0.9991
                    }
               ]
            },
            {
               "type": "table",
               "words_block_count": 2,
               "words_block_list": [
                   {
                        "words": "Words recognized from the table recognition area",
                        "confidence": 0.9942,
                        "rows":[0],
                        "columns":[0]
                    },
                    {
                        "words": "Words recognized from the table recognition area",
                        "confidence": 0.9140,
                        "rows":[0],
                        "columns":[1,2]
                    }
                ]
            }
        ],
        "excel": "/1a/AEASABIAAD/4RFZRXhpZgAATU0AKgAAAAg..."
    }
}
When the input parameter return_text_location is set to true, the text block coordinates and cell coordinates are returned. The text information in the cells is returned as the dictionary list words_list.
{
    "result": {
        "words_region_count": 2,
        "words_region_list": [
            {
               "type": "text",
               "words_block_count": 1,
               "words_block_list": [  
                    {
                       "words": "Words recognized from the text recognition area",
                       "location": [[13,476],
                                   [91, 476],
                                   [91, 560],
                                   [13, 560]],
                       "confidence": 0.9991
                    }
               ]
            },
            {
               "type": "table",
               "words_block_count": 1,
               "words_block_list": [
                   {
                       "rows": [0], 
                       "columns": [0], 
                       "cell_location": [[1042, 525],
                                        [1843, 525],
                                        [1843, 664],
                                        [1042, 664]],
 
                       "words_list": [{
                           "words":"Words recognized from the cell"
                           "confidence": 0.9942,
                           "location":  [[1053, 575],
                                        [1223, 575],
                                        [1223, 633],
                                        [1053, 633]]

                       },
                       {
                           "words":"Words recognized from the cell"
                           "confidence": 0.9140,
                           "location":  [[1678, 587],
                                        [1774, 587],
                                        [1774, 645],
                                        [1678, 645]]
                       }]
                    }
                ]
            }
        ]
        "excel": "/1a/AEASABIAAD/4RFZRXhpZgAATU0AKgAAAAg..."
    }
}

Status code: 400

Failure response example

{
    "error_code": "AIS.0103",
    "error_msg": "The image size does not meet the requirements."
}

Status Codes

Status Code

Description

200

Success response

400

Failure response

For details about status codes, see Status Codes.

Error Codes

For details about error codes, see Error Codes.