文档首页/ 文字识别 OCR/ API参考/ API/ 通用文字识别 - RecognizeGeneralText

更新时间：2025-12-03 GMT+08:00

查看PDF

通用文字识别 - RecognizeGeneralText

功能介绍

识别图片上的文字信息，以JSON格式返回识别的文字和坐标。支持扫描文件、电子文档、书籍、票据和表单等多种场景的文字识别。

支持中英文以及部分繁体字。该接口的使用限制请参见约束与限制，详细使用指导请参见OCR服务使用简介章节。

图1 通用文字示例图
点击放大

约束与限制

只支持识别PNG、JPG、JPEG、BMP、GIF、TIFF、WEBP、PCX、ICO、PSD、PDF格式图片。
图像各边的像素大小在15px到30000px之间，图像高*宽的总像素数不能大于1.6亿。单个图片、PDF文件其对应的Base64编码不超过10MB。
图像中识别区域有效占比超过80%，保证所有文字及其边缘包含在图像内。
支持图像任意角度的水平旋转。
支持自动过滤浅色文字水印。
目前不支持复杂背景（如户外自然场景等）和文字扭曲图像的文字识别。
支持中英文以及部分繁体字、马来语、乌克兰语、印地语、俄语、越南语、印尼语、泰语、阿拉伯语、德语、拉丁语、法语、意大利语、西班牙语、葡萄牙语、罗马尼亚语、波兰语、阿姆哈拉语、日语、韩语、土耳其语、挪威语、丹麦语、瑞典语、柬埔寨语、希伯来语识别。

调用方法

请参见如何调用API。

前提条件

在使用之前，需要您完成服务申请和认证鉴权，具体操作流程请参见开通服务和认证鉴权章节。

用户首次使用需要先申请开通。服务只需要开通一次即可，后面使用时无需再次申请。如未开通服务，调用服务时会提示ModelArts.4204报错，请在调用服务前先进入控制台开通服务，并注意开通服务区域与调用服务的区域保持一致。

授权信息

账号具备所有API的调用权限，如果使用账号下的IAM用户调用当前API，该IAM用户需具备调用API所需的权限，具体权限要求请参见权限和授权项。

URI

POST /v2/{project_id}/ocr/general-text

表1 路径参数
参数	是否必选	说明
endpoint	是	终端节点，即调用API的请求地址。不同服务不同区域的endpoint不同，您可以从终端节点中获取。例如，OCR服务在“华北-北京四”区域的“endpoint”为“ocr.cn-north-4.myhuaweicloud.com”。
project_id	是	项目ID，您可以从获取项目ID中获取。

请求参数

表2 请求Header参数
参数	是否必选	参数类型	描述
X-Auth-Token	是	String	用户Token。用于获取操作API的权限。获取Token接口响应消息头中X-Subject-Token的值即为Token。
Content-Type	是	String	发送的实体的MIME类型，参数值为“application/json”。
Enterprise-Project-Id	否	String	企业项目ID。OCR支持通过企业项目管理（EPS）对不同用户组和用户的资源使用，进行分账。获取方法：进入“企业项目管理”页面，单击企业项目名称，在企业项目详情页获取Enterprise-Project-Id（企业项目ID）。企业项目创建步骤请参见用户指南。说明：创建企业项目后，在传参时，有以下三类场景。携带正确的ID，正常使用OCR服务，账单的企业项目会被分类到企业ID对应的企业项目中。携带格式正确但不存在的ID，正常使用OCR服务，账单的企业项目会显示对应不存在的企业项目ID。不携带ID或格式错误ID（包含特殊字符等），正常使用OCR服务，账单的企业项目会被分类到"default"中。

表3 请求Body参数
参数	是否必选	参数类型	说明
image	否	String	该参数与url二选一。 Base64编码，要求单个图片、PDF文件其对应的Base64编码不超过10MB。文件在Base64编码后会大于文件原本大小，请注意做好边界判断，建议文件大小不超过7MB。图片最短边不小于15px，最长边不超过30000px。支持JPEG、JPG、PNG、BMP、GIF、TIFF、WEBP、PCX、ICO、PSD、PDF格式。图片Base64编码示例如/9j/4AAQSkZJRgABAg...，带有多余前缀会产生The image format is not supported报错。
url	否	String	该参数与image二选一。单个图片、PDF文件其对应的Base64编码不超过10MB。文件在Base64编码后会大于文件原本大小，请注意做好边界判断，建议文件大小不超过7MB。图片的url路径目前支持：公网http/https url例如https://support.huaweicloud.com/api-ocr/zh-cn_image_0288038182.png OBS提供的url，使用OBS数据需要进行授权。包括对服务授权、临时授权、匿名公开授权，详情参见配置OBS访问权限。说明：接口响应时间依赖于图片的下载时间，如果图片下载时间过长，会返回接口调用失败。请保证被检测图片所在的存储服务稳定可靠，推荐使用OBS服务存储图片数据。 url中不能存在中文字符，若存在，中文需要进行utf8编码。
detect_direction	否	Boolean	是否校正图片的倾斜角度，可选值如下。 true：校正图片的倾斜角度 false：不校正图片的倾斜角度支持任意角度的校正，未传入该参数时默认为“false”。待识别图片如果存在倾斜，建议将此参数设置为“true”。
quick_mode	否	Boolean	快速模式开关，针对单行文字图片（要求图片只包含一行文字，且文字区域占比超过50%），打开时可以更快返回识别内容。可选值如下所示。 true：打开快速模式 false：关闭快速模式未传入该参数时默认为false，即关闭快速模式。
character_mode	否	Boolean	单字符模式开关。可选值包括： true：打开单字符模式 false：关闭单字符模式未传入该参数时默认为false，即不返回单个文本行的单字符信息。
language	否	String	语种选择，未传入该参数时默认为中英文识别模式。 auto：自动语种分类 ms：马来语 uk：乌克兰语 hi：印地语 ru：俄语 vi：越南语 id：印尼语 th：泰语 zh：中英文 ar：阿拉伯语 de：德语 la：拉丁语 fr：法语 it：意大利语 es：西班牙语 pt：葡萄牙语 ro：罗马尼亚语 pl：波兰语 am：阿姆哈拉语 ja：日语 ko：韩语 tr：土耳其语 no：挪威语 da：丹麦语 sv：瑞典语 km：柬埔寨语 he：希伯来语
single_orientation_mode	否	Boolean	单朝向模式开关。可选值包括： true：打开单朝向模式 false：关闭单朝向模式未传入该参数时默认为false，即默认图片中的字段为多朝向。
pdf_page_number	否	Integer	指定PDF页码识别。传入该参数时，则识别指定页码的内容。如果不传该参数，则默认识别第1页。
return_markdown_result	否	Boolean	返回文字块拼接结果开关。可选值包括： true：打开返回文字块拼接结果开关； false：关闭返回文字块拼接结果开关。未传入该参数时默认为false，即默认关闭返回文字块拼接结果开关。

响应参数

根据识别的结果，可能有不同的HTTP响应状态码（status code）。例如，200表示API调用成功，400表示调用失败，详细的状态码和响应参数说明如下。

状态码： 200

表4 响应Body参数
参数	参数类型	描述
result	GeneralTextResult object	识别结果。调用失败时不返回此字段。

表5 GeneralTextResult
参数	参数类型	描述
direction	Float	图片朝向。当“detect_direction”为“true”时，该字段有效。返回图片逆时针旋转角度，值区间为0~359。当“detect_direction”为“false”时，该字段值为 -1。
words_block_count	Integer	检测到的文字块数目。
words_block_list	Array of GeneralTextWordsBlockList objects	识别文字块列表。输出顺序从左到右，先上后下。
markdown_result	String	所有文字块拼接的识别结果，同一行的文字块使用“\t”拼接，不同行的文字块使用“\n”拼接。当return_markdown_result为true时，返回该字段值，否则，不返回该字段。

表6 GeneralTextWordsBlockList
参数	参数类型	描述
words	String	文字块识别结果。
location	Array<Array<Integer>>	文字块的区域位置信息，列表形式，包含文字区域四个顶点的二维坐标（x,y）;坐标原点为图片左上角，x轴沿水平方向，y轴沿竖直方向。说明：输入数据格式是PDF时，返回的字段坐标仅用于参考，表示字段间的相对位置关系。
confidence	Float	文字块识别结果的置信度。
char_list	Array of GeneralTextCharList objects	文字块对应的单字符识别列表，输出顺序从左到右，先上后下。

表7 GeneralTextCharList
参数	参数类型	描述
char	String	单字符识别结果。
char_location	Array<Array<Integer>>	单字符的区域位置信息，列表形式，包含字符区域四个顶点的二维坐标（x,y）;坐标原点为图片左上角，x轴沿水平方向，y轴沿竖直方向。
char_confidence	Float	单字符识别结果的置信度。

状态码： 400

表8 响应Body参数
参数	参数类型	说明
error_code	String	调用失败时的错误码，具体请参见错误码。调用成功时不返回此字段。
error_msg	String	调用失败时返回的错误信息。调用成功时不返回此字段。

请求示例

“endpoint”即调用API的请求地址，不同服务不同区域的“endpoint”不同，具体请参见终端节点。
例如，通用文字识别服务部署在“华北-北京四”区域的“endpoint”为“ocr.cn-north-4.myhuaweicloud.com”或“ocr.cn-north-4.myhuaweicloud.cn”，请求URL为“https://ocr.cn-north-4.myhuaweicloud.com/v2/{project_id}/ocr/general-text”，“project_id”为项目ID，获取方法请参见获取项目ID。
如何获取Token请参见认证鉴权。

传入图片的base64编码进行文字识别，识别过程不校验图片倾斜角度，并关闭快速模式

POST https://{endpoint}/v2/{project_id}/ocr/general-text
 Request Header:   
 Content-Type: application/json   
 X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...   
 Request Body:
 {   
    "image":"/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA...",
    "detect_direction":false,
    "quick_mode":false
  }

传入图片的url进行文字识别，识别过程不校验图片倾斜角度，并关闭快速模式

POST https://{endpoint}/v2/{project_id}/ocr/general-text
 Request Header:   
 Content-Type: application/json   
 X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...   
 Request Body:
 {
     "url":"https://BucketName.obs.xxxx.com/ObjectName",
     "detect_direction":false,
     "quick_mode":false
  }

响应示例

状态码：200

成功响应示例

{
  "result" : {
    "direction" : 67.6506,
    "words_block_count" : 1,
    "words_block_list" : [ {
      "words" : "文字",
      "confidence" : 0.9999,
      "location" : [ [ 517, 447 ], [ 540, 504 ], [ 505, 518 ], [ 482, 461 ] ],
      "char_list" : [ {
        "char" : "文",
        "char_location" : [ [ 517, 447 ], [ 530, 479 ], [ 495, 493 ], [ 482, 461 ] ],
        "char_confidence" : 0.9999
      }, {
        "char" : "字",
        "char_location" : [ [ 530, 479 ], [ 540, 504 ], [ 505, 518 ], [ 495, 493 ] ],
        "char_confidence" : 0.9999
      } ]
    } ]
  }
}

状态码：400

失败响应示例

{
    "error_code": "AIS.0103",
    "error_msg": "Theimagesizedoesnotmeettherequirements."
}

SDK代码示例

SDK代码示例如下。

使用SDK前建议将SDK更新至最新版，防止本地旧版SDK无法使用最新的OCR功能。

传入图片的base64编码进行文字识别，识别过程不校验图片倾斜角度，并关闭快速模式

     package com.huaweicloud.sdk.test;

import com.huaweicloud.sdk.core.auth.ICredential;
import com.huaweicloud.sdk.core.auth.BasicCredentials;
import com.huaweicloud.sdk.core.exception.ConnectionException;
import com.huaweicloud.sdk.core.exception.RequestTimeoutException;
import com.huaweicloud.sdk.core.exception.ServiceResponseException;
import com.huaweicloud.sdk.ocr.v1.region.OcrRegion;
import com.huaweicloud.sdk.ocr.v1.*;
import com.huaweicloud.sdk.ocr.v1.model.*;


public class RecognizeGeneralTextSolution {

    public static void main(String[] args) {
        // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
        // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
        String ak = System.getenv("CLOUD_SDK_AK");
        String sk = System.getenv("CLOUD_SDK_SK");

        ICredential auth = new BasicCredentials()
                .withAk(ak)
                .withSk(sk);

        OcrClient client = OcrClient.newBuilder()
                .withCredential(auth)
                .withRegion(OcrRegion.valueOf("<YOUR REGION>"))
                .build();
        RecognizeGeneralTextRequest request = new RecognizeGeneralTextRequest();
        GeneralTextRequestBody body = new GeneralTextRequestBody();
        body.withQuickMode(false);
        body.withDetectDirection(false);
        body.withImage("/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA...");
        request.withBody(body);
        try {
            RecognizeGeneralTextResponse response = client.recognizeGeneralText(request);
            System.out.println(response.toString());
        } catch (ConnectionException e) {
            e.printStackTrace();
        } catch (RequestTimeoutException e) {
            e.printStackTrace();
        } catch (ServiceResponseException e) {
            e.printStackTrace();
            System.out.println(e.getHttpStatusCode());
            System.out.println(e.getRequestId());
            System.out.println(e.getErrorCode());
            System.out.println(e.getErrorMsg());
        }
    }
}
 
 
  

传入图片的url进行文字识别，识别过程不校验图片倾斜角度，并关闭快速模式

     package com.huaweicloud.sdk.test;

import com.huaweicloud.sdk.core.auth.ICredential;
import com.huaweicloud.sdk.core.auth.BasicCredentials;
import com.huaweicloud.sdk.core.exception.ConnectionException;
import com.huaweicloud.sdk.core.exception.RequestTimeoutException;
import com.huaweicloud.sdk.core.exception.ServiceResponseException;
import com.huaweicloud.sdk.ocr.v1.region.OcrRegion;
import com.huaweicloud.sdk.ocr.v1.*;
import com.huaweicloud.sdk.ocr.v1.model.*;


public class RecognizeGeneralTextSolution {

    public static void main(String[] args) {
        // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
        // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
        String ak = System.getenv("CLOUD_SDK_AK");
        String sk = System.getenv("CLOUD_SDK_SK");

        ICredential auth = new BasicCredentials()
                .withAk(ak)
                .withSk(sk);

        OcrClient client = OcrClient.newBuilder()
                .withCredential(auth)
                .withRegion(OcrRegion.valueOf("<YOUR REGION>"))
                .build();
        RecognizeGeneralTextRequest request = new RecognizeGeneralTextRequest();
        GeneralTextRequestBody body = new GeneralTextRequestBody();
        body.withQuickMode(false);
        body.withDetectDirection(false);
        body.withUrl("https://BucketName.obs.myhuaweicloud.com/ObjectName");
        request.withBody(body);
        try {
            RecognizeGeneralTextResponse response = client.recognizeGeneralText(request);
            System.out.println(response.toString());
        } catch (ConnectionException e) {
            e.printStackTrace();
        } catch (RequestTimeoutException e) {
            e.printStackTrace();
        } catch (ServiceResponseException e) {
            e.printStackTrace();
            System.out.println(e.getHttpStatusCode());
            System.out.println(e.getRequestId());
            System.out.println(e.getErrorCode());
            System.out.println(e.getErrorMsg());
        }
    }
}
 
 
  

传入图片的base64编码进行文字识别，识别过程不校验图片倾斜角度，并关闭快速模式

     # coding: utf-8

from huaweicloudsdkcore.auth.credentials import BasicCredentials
from huaweicloudsdkocr.v1.region.ocr_region import OcrRegion
from huaweicloudsdkcore.exceptions import exceptions
from huaweicloudsdkocr.v1 import *

if __name__ == "__main__":
    # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak = os.getenv("CLOUD_SDK_AK")
    sk = os.getenv("CLOUD_SDK_SK")

    credentials = BasicCredentials(ak, sk) \

    client = OcrClient.new_builder() \
        .with_credentials(credentials) \
        .with_region(OcrRegion.value_of("<YOUR REGION>")) \
        .build()

    try:
        request = RecognizeGeneralTextRequest()
        request.body = GeneralTextRequestBody(
            quick_mode=False,
            detect_direction=False,
            image="/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA..."
        )
        response = client.recognize_general_text(request)
        print(response)
    except exceptions.ClientRequestException as e:
        print(e.status_code)
        print(e.request_id)
        print(e.error_code)
        print(e.error_msg)
 
 
  

传入图片的url进行文字识别，识别过程不校验图片倾斜角度，并关闭快速模式

     # coding: utf-8

from huaweicloudsdkcore.auth.credentials import BasicCredentials
from huaweicloudsdkocr.v1.region.ocr_region import OcrRegion
from huaweicloudsdkcore.exceptions import exceptions
from huaweicloudsdkocr.v1 import *

if __name__ == "__main__":
    # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak = os.getenv("CLOUD_SDK_AK")
    sk = os.getenv("CLOUD_SDK_SK")

    credentials = BasicCredentials(ak, sk) \

    client = OcrClient.new_builder() \
        .with_credentials(credentials) \
        .with_region(OcrRegion.value_of("<YOUR REGION>")) \
        .build()

    try:
        request = RecognizeGeneralTextRequest()
        request.body = GeneralTextRequestBody(
            quick_mode=False,
            detect_direction=False,
            url="https://BucketName.obs.myhuaweicloud.com/ObjectName"
        )
        response = client.recognize_general_text(request)
        print(response)
    except exceptions.ClientRequestException as e:
        print(e.status_code)
        print(e.request_id)
        print(e.error_code)
        print(e.error_msg)
 
 
  

传入图片的base64编码进行文字识别，识别过程不校验图片倾斜角度，并关闭快速模式

     package main

import (
	"fmt"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic"
    ocr "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/model"
    region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/region"
)

func main() {
    // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak := os.Getenv("CLOUD_SDK_AK")
    sk := os.Getenv("CLOUD_SDK_SK")

    auth := basic.NewCredentialsBuilder().
        WithAk(ak).
        WithSk(sk).
        Build()

    client := ocr.NewOcrClient(
        ocr.OcrClientBuilder().
            WithRegion(region.ValueOf("<YOUR REGION>")).
            WithCredential(auth).
            Build())

    request := &model.RecognizeGeneralTextRequest{}
	quickModeGeneralTextRequestBody:= false
	detectDirectionGeneralTextRequestBody:= false
	imageGeneralTextRequestBody:= "/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA..."
	request.Body = &model.GeneralTextRequestBody{
		QuickMode: &quickModeGeneralTextRequestBody,
		DetectDirection: &detectDirectionGeneralTextRequestBody,
		Image: &imageGeneralTextRequestBody,
	}
	response, err := client.RecognizeGeneralText(request)
	if err == nil {
        fmt.Printf("%+v\n", response)
    } else {
        fmt.Println(err)
    }
}
 
 
  

传入图片的url进行文字识别，识别过程不校验图片倾斜角度，并关闭快速模式

     package main

import (
	"fmt"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic"
    ocr "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/model"
    region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/region"
)

func main() {
    // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak := os.Getenv("CLOUD_SDK_AK")
    sk := os.Getenv("CLOUD_SDK_SK")

    auth := basic.NewCredentialsBuilder().
        WithAk(ak).
        WithSk(sk).
        Build()

    client := ocr.NewOcrClient(
        ocr.OcrClientBuilder().
            WithRegion(region.ValueOf("<YOUR REGION>")).
            WithCredential(auth).
            Build())

    request := &model.RecognizeGeneralTextRequest{}
	quickModeGeneralTextRequestBody:= false
	detectDirectionGeneralTextRequestBody:= false
	urlGeneralTextRequestBody:= "https://BucketName.obs.myhuaweicloud.com/ObjectName"
	request.Body = &model.GeneralTextRequestBody{
		QuickMode: &quickModeGeneralTextRequestBody,
		DetectDirection: &detectDirectionGeneralTextRequestBody,
		Url: &urlGeneralTextRequestBody,
	}
	response, err := client.RecognizeGeneralText(request)
	if err == nil {
        fmt.Printf("%+v\n", response)
    } else {
        fmt.Println(err)
    }
}
 
 
  

更多编程语言的SDK代码示例，请参见API Explorer的代码示例页签，可生成自动对应的SDK代码示例。

状态码

状态码	描述
200	成功响应示例
400	失败响应示例

状态码请参见状态码。

错误码

错误码请参见错误码。

父主题： API

上一篇：通用表格识别 - RecognizeGeneralTable

下一篇：网络图片识别 - RecognizeWebImage

意见反馈

文档内容是否对您有帮助？

有帮助没帮助

提供反馈

提交成功！非常感谢您的反馈，我们会继续努力做到更好！您可在我的云声建议查看反馈及问题处理状态。

系统繁忙，请稍后重试

如您有其它疑问，您也可以通过华为云社区问答频道来与我们联系探讨

云宝助手提问云社区提问

通用文字识别 - RecognizeGeneralText

功能介绍

约束与限制

调用方法

前提条件

授权信息

URI

请求参数

响应参数

请求示例

响应示例

SDK代码示例

状态码

错误码

相关文档

意见反馈

文档内容是否对您有帮助？