General Text
Function
This API detects and extracts text from images and converts the text and coordinates into JSON format. It can be used in various scenarios, such as scanned files, electronic documents, books, receipts, and forms.
English and Chinese are supported but support for traditional Chinese characters is limited. For the notes and constraints on using this API, see Notes and Constraints. For how to use this API, see Introduction to OCR.
Notes and Constraints
- Only images in PNG, JPG, JPEG, BMP, GIF, TIFF, WebP, PCX, ICO, PSD, or PDF format can be recognized.
- No side of the image can be smaller than 15 or larger than 8,192 pixels.
- The area to be recognized must occupy more than 80% of the image. When scanning a table, ensure that all text and its surrounding area are included in the image.
- An image can be rotated to any angle.
- Light-colored text watermarks can be automatically filtered out.
- Text in images with complex backgrounds (such as outdoor scenery) or distorted text cannot be recognized.
- Supported languages: Chinese, English, some traditional Chinese, Malay, Ukrainian, Hindi, Russian, Vietnamese, Indonesian, Thai, Arabic, German, Latin, French, Italian, Spanish, Portuguese, Romanian, Polish Amharic, Japanese, Korean, Turkish, Norwegian, Danish, Swedish, Khmer, and Hebrew.
Calling Method
For details, see Calling APIs.
Prerequisites
Before using this API, subscribe to the service and complete authentication. For details, see Subscribing to an OCR Service and Authentication.
Before using the service for the first time, you need to enable the service by clicking Subscribe. You only need to subscribe to the service once. If you have not subscribed to the service yet, error "ModelArts.4204" will be displayed when you call this API. Before you call the API, log in to the OCR console and subscribe to the corresponding service. Ensure that you make the subscription to the service in the same region where you want to call this API.
URI
POST /v2/{project_id}/ocr/general-text
Parameter |
Mandatory |
Description |
---|---|---|
endpoint |
Yes |
Endpoint, which is the request address for calling an API. The endpoint varies depending on services in different regions. For more details, see Endpoints. |
project_id |
Yes |
Project ID, which can be obtained from Obtaining a Project ID. |
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
X-Auth-Token |
Yes |
String |
User token Used to obtain the permission to call APIs. The token is the value of X-Subject-Token in the response header in Authentication. |
Content-Type |
Yes |
String |
MIME type of the request body. The value is application/json. |
Enterprise-Project-Id |
No |
String |
Enterprise project ID. OCR uses Enterprise Project Management Service (EPS) to split fees for resources used by different user groups and users. To obtain the enterprise project ID, go to the Enterprise Project Management console, click the enterprise project name, and obtain the enterprise project ID on the enterprise project details page.
For details about how to create an enterprise project, see Optical Character Recognition User Guide.
NOTE:
After an enterprise project is created, parameter transfer involves the following scenarios:
|
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
image |
No |
String |
Set either this parameter or url. Base64 encoded string of an image file. The image file has a size limit of 10 MB. No side of the image can be smaller than 15 or larger than 8,192 pixels. Only images in JPEG, JPG, PNG, BMP, GIF, TIFF, WebP, PCX, ICO, PDF, or PSD format can be recognized. An example is /9j/4AAQSkZJRgABAg.... If the image data contains an unnecessary prefix, the error "The image format is not supported" is reported. |
url |
No |
String |
Set either this parameter or image. Image URL. Currently, the following URLs are supported:
NOTE:
|
detect_direction |
No |
Boolean |
Whether to align the tilted image. The options are:
An image tilted to any angle can be aligned. If this parameter is not specified, false is used by default. If the image to be recognized is tilted, you are advised to set this parameter to true. |
quick_mode |
No |
Boolean |
Whether to enable the quick mode. For a single-line text image (the image contains only one line of text and the text area occupies more than 50% of the image), the recognition results can be returned more quickly when this quick mode is enabled. The options are:
If this parameter is not specified, false is used by default. In this case, the quick mode will be disabled. |
character_mode |
No |
Boolean |
Whether to enable the single-character mode. The options are:
If this parameter is not transferred, the default value false is used, and information about a single character that occupies a text line is not returned. |
language |
No |
String |
Language. If this parameter is not specified, Chinese and English will be used by default. The options are:
|
single_orientation_mode |
No |
Boolean |
Whether to enable the single direction mode. The options are:
If not specified, false is used by default. In this case, the fields in the image are recognized as in multiple directions by default. |
pdf_page_number |
No |
Integer |
Specify which page of the PDF to recognize. If passed in, the content on the specified page is identified. If not specified, the default is to recognize the first page. |
Response Parameters
The status code may vary depending on the recognition results. For example, 200 indicates that the API is successfully called, and 400 indicates that the API fails to be called. The following describes the status codes and corresponding response parameters.
Status code: 200
Parameter |
Type |
Description |
---|---|---|
result |
GeneralTextResult object |
Recognition result This parameter is not returned when the API fails to be called. |
Parameter |
Type |
Description |
---|---|---|
direction |
Float |
Image direction
|
words_block_count |
Integer |
Number of detected text blocks |
words_block_list |
Array of GeneralTextWordsBlockList objects |
List of recognized text blocks. The output sequence is from left to right and from top to bottom. |
Parameter |
Type |
Description |
---|---|---|
words |
String |
Recognition result of a text block |
location |
Array<Array<Integer>> |
List of location information about a text block, including the 2D coordinates (x, y) of four vertexes in the text area, where the coordinate origin is the upper-left corner of the image, the X axis is horizontal, and the Y axis is vertical. |
confidence |
Float |
Confidence of a recognized text block |
char_list |
Array of GeneralTextCharList objects |
Single-character recognition list corresponding to a text block. The output sequence is from left to right and from top to bottom. |
Parameter |
Type |
Description |
---|---|---|
char |
String |
Recognition result of a single character |
char_location |
Array<Array<Integer>> |
List of location information about a single character, including the 2D coordinates (x, y) of four vertexes in the character area, where the coordinate origin is the upper-left corner of the image, the X axis is horizontal, and the Y axis is vertical. |
char_confidence |
Float |
Confidence of a recognized character |
Status code: 400
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
Error code of a failed API call. For details, see Error Codes. This parameter is not returned when the API is successfully called. |
error_msg |
String |
Error message when the API call fails This parameter is not returned when the API is successfully called. |
Example Request
- endpoint is the request URL for calling an API. Endpoints vary depending on services and regions. For details, see Endpoints.
For example, General Text OCR is deployed in the CN-Hong Kong region. The endpoint is ocr.ap-southeast-1.myhuaweicloud.com or ocr.ap-southeast-1.myhuaweicloud.cn. The request URL is https://ocr.ap-southeast-1.myhuaweicloud.com/v2/{project_id}/ocr/general-text. project_id is the project ID. For how to obtain the project ID, see Obtaining a Project ID.
- For details about how to obtain a token, see Authentication.
- Transfer the Base64 encoded string of the image for recognition. During the recognition, the tilt angle of the image is not verified, and the quick mode is disabled.
POST https://{endpoint}/v2/{project_id}/ocr/general-text Request Header: Content-Type: application/json X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG... Request Body: { "image":"/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA...", "detect_direction":false, "quick_mode":false }
- Transfer the URL of the image for recognition. During the recognition, the tilt angle of the image is not verified, and the quick mode is disabled.
POST https://{endpoint}/v2/{project_id}/ocr/general-text Request Header: Content-Type: application/json X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG... Request Body: { "url":"https://BucketName.obs.xxxx.com/ObjectName", "detect_direction":false, "quick_mode":false }
Example Response
Status code: 200
Example response for a successful request
{ "result" : { "direction" : 67.6506, "words_block_count" : 1, "words_block_list" : [ { "words" : "Word", "confidence" : 0.9999, "location" : [ [ 517, 447 ], [ 540, 504 ], [ 505, 518 ], [ 482, 461 ] ], "char_list" : [ { "char" : "Character", "char_location" : [ [ 517, 447 ], [ 530, 479 ], [ 495, 493 ], [ 482, 461 ] ], "char_confidence" : 0.9999 }, { "char" : "Character", "char_location" : [ [ 530, 479 ], [ 540, 504 ], [ 505, 518 ], [ 495, 493 ] ], "char_confidence" : 0.9999 } ] } ] } }
Status code: 400
{ "error_code": "AIS.0103", "error_msg": "The image size does not meet the requirements." }
Example SDK Code
The example SDK code is as follows:
You are advised to update the SDKs to the latest versions before use to prevent the local outdated SDKs from being unable to use the latest OCR functions.
- Transfer the Base64 encoded string of the image for recognition. During the recognition, the tilt angle of the image is not verified, and the quick mode is disabled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
package com.huaweicloud.sdk.test; import com.huaweicloud.sdk.core.auth.ICredential; import com.huaweicloud.sdk.core.auth.BasicCredentials; import com.huaweicloud.sdk.core.exception.ConnectionException; import com.huaweicloud.sdk.core.exception.RequestTimeoutException; import com.huaweicloud.sdk.core.exception.ServiceResponseException; import com.huaweicloud.sdk.ocr.v1.region.OcrRegion; import com.huaweicloud.sdk.ocr.v1.*; import com.huaweicloud.sdk.ocr.v1.model.*; public class RecognizeGeneralTextSolution { public static void main(String[] args) { // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment String ak = System.getenv("CLOUD_SDK_AK"); String sk = System.getenv("CLOUD_SDK_SK"); ICredential auth = new BasicCredentials() .withAk(ak) .withSk(sk); OcrClient client = OcrClient.newBuilder() .withCredential(auth) .withRegion(OcrRegion.valueOf("<YOUR REGION>")) .build(); RecognizeGeneralTextRequest request = new RecognizeGeneralTextRequest(); GeneralTextRequestBody body = new GeneralTextRequestBody(); body.withQuickMode(false); body.withDetectDirection(false); body.withImage("/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA..."); request.withBody(body); try { RecognizeGeneralTextResponse response = client.recognizeGeneralText(request); System.out.println(response.toString()); } catch (ConnectionException e) { e.printStackTrace(); } catch (RequestTimeoutException e) { e.printStackTrace(); } catch (ServiceResponseException e) { e.printStackTrace(); System.out.println(e.getHttpStatusCode()); System.out.println(e.getRequestId()); System.out.println(e.getErrorCode()); System.out.println(e.getErrorMsg()); } } }
- Transfer the URL of the image for recognition. During the recognition, the tilt angle of the image is not verified, and the quick mode is disabled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
package com.huaweicloud.sdk.test; import com.huaweicloud.sdk.core.auth.ICredential; import com.huaweicloud.sdk.core.auth.BasicCredentials; import com.huaweicloud.sdk.core.exception.ConnectionException; import com.huaweicloud.sdk.core.exception.RequestTimeoutException; import com.huaweicloud.sdk.core.exception.ServiceResponseException; import com.huaweicloud.sdk.ocr.v1.region.OcrRegion; import com.huaweicloud.sdk.ocr.v1.*; import com.huaweicloud.sdk.ocr.v1.model.*; public class RecognizeGeneralTextSolution { public static void main(String[] args) { // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment String ak = System.getenv("CLOUD_SDK_AK"); String sk = System.getenv("CLOUD_SDK_SK"); ICredential auth = new BasicCredentials() .withAk(ak) .withSk(sk); OcrClient client = OcrClient.newBuilder() .withCredential(auth) .withRegion(OcrRegion.valueOf("<YOUR REGION>")) .build(); RecognizeGeneralTextRequest request = new RecognizeGeneralTextRequest(); GeneralTextRequestBody body = new GeneralTextRequestBody(); body.withQuickMode(false); body.withDetectDirection(false); body.withUrl("https://BucketName.obs.myhuaweicloud.com/ObjectName"); request.withBody(body); try { RecognizeGeneralTextResponse response = client.recognizeGeneralText(request); System.out.println(response.toString()); } catch (ConnectionException e) { e.printStackTrace(); } catch (RequestTimeoutException e) { e.printStackTrace(); } catch (ServiceResponseException e) { e.printStackTrace(); System.out.println(e.getHttpStatusCode()); System.out.println(e.getRequestId()); System.out.println(e.getErrorCode()); System.out.println(e.getErrorMsg()); } } }
- Transfer the Base64 encoded string of the image for recognition. During the recognition, the tilt angle of the image is not verified, and the quick mode is disabled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
# coding: utf-8 from huaweicloudsdkcore.auth.credentials import BasicCredentials from huaweicloudsdkocr.v1.region.ocr_region import OcrRegion from huaweicloudsdkcore.exceptions import exceptions from huaweicloudsdkocr.v1 import * if __name__ == "__main__": # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment ak = os.getenv("CLOUD_SDK_AK") sk = os.getenv("CLOUD_SDK_SK") credentials = BasicCredentials(ak, sk) \ client = OcrClient.new_builder() \ .with_credentials(credentials) \ .with_region(OcrRegion.value_of("<YOUR REGION>")) \ .build() try: request = RecognizeGeneralTextRequest() request.body = GeneralTextRequestBody( quick_mode=False, detect_direction=False, image="/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA..." ) response = client.recognize_general_text(request) print(response) except exceptions.ClientRequestException as e: print(e.status_code) print(e.request_id) print(e.error_code) print(e.error_msg)
- Transfer the URL of the image for recognition. During the recognition, the tilt angle of the image is not verified, and the quick mode is disabled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
# coding: utf-8 from huaweicloudsdkcore.auth.credentials import BasicCredentials from huaweicloudsdkocr.v1.region.ocr_region import OcrRegion from huaweicloudsdkcore.exceptions import exceptions from huaweicloudsdkocr.v1 import * if __name__ == "__main__": # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment ak = os.getenv("CLOUD_SDK_AK") sk = os.getenv("CLOUD_SDK_SK") credentials = BasicCredentials(ak, sk) \ client = OcrClient.new_builder() \ .with_credentials(credentials) \ .with_region(OcrRegion.value_of("<YOUR REGION>")) \ .build() try: request = RecognizeGeneralTextRequest() request.body = GeneralTextRequestBody( quick_mode=False, detect_direction=False, url="https://BucketName.obs.myhuaweicloud.com/ObjectName" ) response = client.recognize_general_text(request) print(response) except exceptions.ClientRequestException as e: print(e.status_code) print(e.request_id) print(e.error_code) print(e.error_msg)
- Transfer the Base64 encoded string of the image for recognition. During the recognition, the tilt angle of the image is not verified, and the quick mode is disabled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
package main import ( "fmt" "github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic" ocr "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1" "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/model" region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/region" ) func main() { // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment ak := os.Getenv("CLOUD_SDK_AK") sk := os.Getenv("CLOUD_SDK_SK") auth := basic.NewCredentialsBuilder(). WithAk(ak). WithSk(sk). Build() client := ocr.NewOcrClient( ocr.OcrClientBuilder(). WithRegion(region.ValueOf("<YOUR REGION>")). WithCredential(auth). Build()) request := &model.RecognizeGeneralTextRequest{} quickModeGeneralTextRequestBody:= false detectDirectionGeneralTextRequestBody:= false imageGeneralTextRequestBody:= "/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA..." request.Body = &model.GeneralTextRequestBody{ QuickMode: &quickModeGeneralTextRequestBody, DetectDirection: &detectDirectionGeneralTextRequestBody, Image: &imageGeneralTextRequestBody, } response, err := client.RecognizeGeneralText(request) if err == nil { fmt.Printf("%+v\n", response) } else { fmt.Println(err) } }
- Transfer the URL of the image for recognition. During the recognition, the tilt angle of the image is not verified, and the quick mode is disabled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
package main import ( "fmt" "github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic" ocr "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1" "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/model" region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/region" ) func main() { // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security. // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment ak := os.Getenv("CLOUD_SDK_AK") sk := os.Getenv("CLOUD_SDK_SK") auth := basic.NewCredentialsBuilder(). WithAk(ak). WithSk(sk). Build() client := ocr.NewOcrClient( ocr.OcrClientBuilder(). WithRegion(region.ValueOf("<YOUR REGION>")). WithCredential(auth). Build()) request := &model.RecognizeGeneralTextRequest{} quickModeGeneralTextRequestBody:= false detectDirectionGeneralTextRequestBody:= false urlGeneralTextRequestBody:= "https://BucketName.obs.myhuaweicloud.com/ObjectName" request.Body = &model.GeneralTextRequestBody{ QuickMode: &quickModeGeneralTextRequestBody, DetectDirection: &detectDirectionGeneralTextRequestBody, Url: &urlGeneralTextRequestBody, } response, err := client.RecognizeGeneralText(request) if err == nil { fmt.Printf("%+v\n", response) } else { fmt.Println(err) } }
For more SDK code examples in various programming languages, see the Sample Code tab on the right of the API Explorer page, which can automatically generate corresponding SDK code examples.
Status Codes
Status Code |
Description |
---|---|
200 |
Example response for a successful request |
400 |
Example response for a failed request |
See Status Codes.
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot