文档首页/ 文字识别 OCR/ API参考/ API/ 网络图片识别

更新时间：2024-06-19 GMT+08:00

在线调试

CLI示例

查看PDF

网络图片识别

功能介绍

识别网络图片中的文字内容，并以JSON格式返回识别的结构化结果。

该接口的使用限制请参见约束与限制，详细使用指导请参见OCR服务使用简介章节。

约束与限制

支持中英文及部分中文繁体字。
只支持识别JPG、JPEG、PNG、BMP、TIFF、TGA、WEBP、ICO、PCX、GIF格式图片。
支持常见网络图片：手机截图、电脑截图、电商产品图及广告设计图等互联网图片。
图像各边的像素大小在15px到8192px之间。
图像中有效文字图片占比超过60%，避免有效文字图片占比过小。
支持图像中有效文字图片的任意角度的水平旋转（需开启方向检测）。

调用方法

请参见如何调用API。

前提条件

在使用之前，需要您完成服务申请和认证鉴权，具体操作流程请参见开通服务和认证鉴权章节。

用户首次使用需要先申请开通。服务只需要开通一次即可，后面使用时无需再次申请。如未开通服务，调用服务时会提示ModelArts.4204报错，请在调用服务前先进入控制台开通服务，并注意开通服务区域与调用服务的区域保持一致。

URI

POST /v2/{project_id}/ocr/web-image

表1 路径参数
参数	是否必选	说明
endpoint	是	终端节点，即调用API的请求地址。不同服务不同区域的endpoint不同，您可以从终端节点中获取。
project_id	是	项目ID，您可以从获取项目ID中获取。

请求参数

表2 请求Header参数
参数	是否必选	参数类型	描述
X-Auth-Token	是	String	用户Token。用于获取操作API的权限。获取Token接口响应消息头中X-Subject-Token的值即为Token。
Content-Type	是	String	发送的实体的MIME类型，参数值为“application/json”。

表3 请求Body参数
参数	是否必选	参数类型	说明
image	否	String	该参数与url二选一。图片的Base64编码，要求Base64编码后大小不超过10MB。图片最短边不小于15px，最长边不超过8192px，支持JPG、JPEG、PNG、BMP、TIFF、TGA、WEBP、ICO、PCX、GIF格式。图片Base64编码示例如/9j/4AAQSkZJRgABAg...，带有多余前缀会产生The image format is not supported报错。
url	否	String	该参数与image二选一。图片的url路径，目前支持：公网http/https url OBS提供的url，使用OBS数据需要进行授权。包括对服务授权、临时授权、匿名公开授权，详情参见配置OBS访问权限。说明：接口响应时间依赖于图片的下载时间，如果图片下载时间过长，会返回接口调用失败。请保证被检测图片所在的存储服务稳定可靠，推荐使用OBS服务存储图片数据。 url中不能存在中文字符，若存在，中文需要进行utf8编码。
detect_direction	否	Boolean	是否校正图片的倾斜角度，可选值如下。 true：校正图片的倾斜角度 false：不校正图片的倾斜角度支持任意角度的校正，未传入该参数时默认为“false”。待识别图片如果存在倾斜，建议将此参数设置为“true”。
extract_type	否	Array of strings	结构化数据提取参数列表，目前只支持图像宽高，其入参值为“image_size”。若不填写该参数或删除该参数，默认不提取该参数值。
detect_font	否	Boolean	为Boolean类型，若不传该字段，默认不检测切片字体，为True时，将检测切片的字体类型，并返回最相似的5种字体名称。
detect_text_direction	否	Boolean	为Boolean类型，若不传该字段，默认为True，即检测每个字段的文字方向。为False时，则不检测文字方向。若图片中所有文字方向均是水平朝上时，建议将该值设为False，即不检测文字方向。

响应参数

根据识别的结果，可能有不同的HTTP响应状态码（status code）。例如，200表示API调用成功，400表示调用失败，详细的状态码和响应参数说明如下。

状态码： 200

表4 响应Body参数
参数	参数类型	描述
result	WebImageResult object	调用成功时表示调用结果。调用失败时无此字段。

表5 WebImageResult
参数	参数类型	描述
words_block_count	Integer	代表检测识别出来的文字块数目。
words_block_list	Array of WebImageWordsBlockList objects	识别文字块列表，输出顺序从左到右，从上到下。
extracted_data	WebImageExtractedData object	提取出的结构化JSON结果，该字典内的key值与入参列表extract_type的值一致，目前仅支持联系人、图像高宽信息提取，亦即key值为"contact_info"的字段、"image_size"字段。若入参extract_type为空列表或该字段缺失时，不进行提取，此字段为空。

表6 WebImageWordsBlockList
参数	参数类型	描述
words	String	文字块识别结果。
confidence	Float	相关字段的置信度信息，置信度越大，表示本次识别的对应字段的可靠性越高，在统计意义上，置信度越大，准确率越高。置信度由算法给出，不直接等价于对应字段的准确率。
location	Array<Array<Integer>>	文字块的区域位置信息，列表形式，包含文字区域四个顶点的二维坐标（x,y）;坐标原点为图片左上角，x轴沿水平方向，y轴沿竖直方向。
font_list	Array of strings	文字块所属字体类型，列表形式，表示与文字块的文字最接近的字体类型。
font_scores	Array of numbers	文字块所属字体类型的概率，列表形式，与font_list一一对应，表示文字块的文字属于某种字体类型的概率。

表7 WebImageExtractedData
参数	参数类型	描述
contact_info	WebImageContactInfo object	该字段表示提取的联系人信息，包括：姓名、联系电话、省市区以及详细地址。若入参extract_type列表中无该字段，则此字段不存在。
image_size	WebImageImageSize object	该字段表示返回图片宽高信息。如入参extract_type列表中无该字段，则此字段不存在。

表8 WebImageContactInfo
参数	参数类型	描述
name	String	传入contact_info时的返回，为姓名。
phone	String	传入contact_info时的返回，联系电话。
province	String	传入contact_info时的返回，省。
city	String	传入contact_info时的返回，市。
district	String	传入contact_info时的返回，县区。
detail_address	String	传入contact_info时的返回，详细地址（不含省市区）。

表9 WebImageImageSize
参数	参数类型	描述
height	Integer	传入image_size时的返回，为图像高度。
width	Integer	传入image_size时的返回，为图像宽度。

状态码： 400

**表10** 响应Body参数
参数	参数类型	描述
error_code	String	调用失败时的错误码，具体请参见错误码。调用成功时不返回此字段。
error_msg	String	调用失败时的错误信息。调用成功时无此字段。

请求示例

“endpoint”即调用API的请求地址，不同服务不同区域的“endpoint”不同，具体请参见终端节点。
例如，网络图片识别服务部署在“亚太-曼谷”区域的“endpoint”为“ocr.ap-southeast-2.myhuaweicloud.com”或“ocr.ap-southeast-2.myhuaweicloud.cn”，请求URL为“https://ocr.ap-southeast-2.myhuaweicloud.com/v2/{project_id}/ocr/web-image”，“project_id”为项目ID，获取方法请参见获取项目ID。
如何获取Token请参见认证鉴权。

传入网络图片的base64编码进行文字识别

POST https://{endpoint}/v2/{project_id}/ocr/web-image
Request Header:
Content-Type: application/json
X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...

Request Body:  
{  
    "image":"/9j/4AAQSkZJRgABAgEASABIAAD/..."
}

传入网络图片的url进行文字识别

POST https://{endpoint}/v2/{project_id}/ocr/web-image
Request Header:   
Content-Type: application/json   
X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...       
Request Body:
{
     "url":"https://BucketName.obs.xxxx.com/ObjectName"
}

响应示例

状态码：200

成功响应示例

{ 
  "result": { 
      "words_block_count": 3, 
      "words_block_list": [ 
          { 
              "words": "文字块1", 
              "confidence": 0.9950,
              "location": [ 
                  [13, 476], 
                  [91, 332], 
                  [125, 351], 
                  [48, 494] 
              ] 
          }, 
          { 
              "words": "文字块2", 
              "confidence": 0.9910,
              "location": [ 
                  [13, 476], 
                  [91, 332], 
                  [125, 351], 
                  [48, 494] 
              ] 
          }, 
          { 
              "words": "文字块3", 
              "confidence": 0.9910,
              "location": [ 
                  [13, 476], 
                  [91, 332], 
                  [125, 351], 
                  [48, 494] 
              ] 
          } 
      ],
      "extracted_data": {}
  } 
}

状态码：400

失败响应示例

{
    "error_code": "AIS.0103", 
    "error_msg": "The image size does not meet the requirements." 
}

SDK代码示例

SDK代码示例如下。

使用SDK前建议将SDK更新至最新版，防止本地旧版SDK无法使用最新的OCR功能。

传入网络图片的base64编码进行文字识别，识别过程校验图片倾斜角度，判断待识别字体类型，并校验图片是否包含联系人信息

        
         
           
           
             package com.huaweicloud.sdk.test;

import com.huaweicloud.sdk.core.auth.ICredential;
import com.huaweicloud.sdk.core.auth.BasicCredentials;
import com.huaweicloud.sdk.core.exception.ConnectionException;
import com.huaweicloud.sdk.core.exception.RequestTimeoutException;
import com.huaweicloud.sdk.core.exception.ServiceResponseException;
import com.huaweicloud.sdk.ocr.v1.region.OcrRegion;
import com.huaweicloud.sdk.ocr.v1.*;
import com.huaweicloud.sdk.ocr.v1.model.*;

import java.util.List;
import java.util.ArrayList;

public class RecognizeWebImageSolution {

    public static void main(String[] args) {
        // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
        // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
        String ak = System.getenv("CLOUD_SDK_AK");
        String sk = System.getenv("CLOUD_SDK_SK");

        ICredential auth = new BasicCredentials()
                .withAk(ak)
                .withSk(sk);

        OcrClient client = OcrClient.newBuilder()
                .withCredential(auth)
                .withRegion(OcrRegion.valueOf("<YOUR REGION>"))
                .build();
        RecognizeWebImageRequest request = new RecognizeWebImageRequest();
        WebImageRequestBody body = new WebImageRequestBody();
        List<String> listbodyExtractType = new ArrayList<>();
        listbodyExtractType.add("contact_info");
        listbodyExtractType.add("image_size");
        body.withDetectFont(true);
        body.withExtractType(listbodyExtractType);
        body.withDetectDirection(true);
        body.withImage("/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA...");
        request.withBody(body);
        try {
            RecognizeWebImageResponse response = client.recognizeWebImage(request);
            System.out.println(response.toString());
        } catch (ConnectionException e) {
            e.printStackTrace();
        } catch (RequestTimeoutException e) {
            e.printStackTrace();
        } catch (ServiceResponseException e) {
            e.printStackTrace();
            System.out.println(e.getHttpStatusCode());
            System.out.println(e.getRequestId());
            System.out.println(e.getErrorCode());
            System.out.println(e.getErrorMsg());
        }
    }
}

            

          

        
       

传入网络图片的url进行文字识别，识别过程校验图片倾斜角度，判断待识别字体类型，并校验图片是否包含联系人信息

        
         
           
           
             package com.huaweicloud.sdk.test;

import com.huaweicloud.sdk.core.auth.ICredential;
import com.huaweicloud.sdk.core.auth.BasicCredentials;
import com.huaweicloud.sdk.core.exception.ConnectionException;
import com.huaweicloud.sdk.core.exception.RequestTimeoutException;
import com.huaweicloud.sdk.core.exception.ServiceResponseException;
import com.huaweicloud.sdk.ocr.v1.region.OcrRegion;
import com.huaweicloud.sdk.ocr.v1.*;
import com.huaweicloud.sdk.ocr.v1.model.*;

import java.util.List;
import java.util.ArrayList;

public class RecognizeWebImageSolution {

    public static void main(String[] args) {
        // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
        // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
        String ak = System.getenv("CLOUD_SDK_AK");
        String sk = System.getenv("CLOUD_SDK_SK");

        ICredential auth = new BasicCredentials()
                .withAk(ak)
                .withSk(sk);

        OcrClient client = OcrClient.newBuilder()
                .withCredential(auth)
                .withRegion(OcrRegion.valueOf("<YOUR REGION>"))
                .build();
        RecognizeWebImageRequest request = new RecognizeWebImageRequest();
        WebImageRequestBody body = new WebImageRequestBody();
        List<String> listbodyExtractType = new ArrayList<>();
        listbodyExtractType.add("contact_info");
        listbodyExtractType.add("image_size");
        body.withDetectFont(true);
        body.withExtractType(listbodyExtractType);
        body.withDetectDirection(true);
        body.withUrl("https://BucketName.obs.myhuaweicloud.com/ObjectName");
        request.withBody(body);
        try {
            RecognizeWebImageResponse response = client.recognizeWebImage(request);
            System.out.println(response.toString());
        } catch (ConnectionException e) {
            e.printStackTrace();
        } catch (RequestTimeoutException e) {
            e.printStackTrace();
        } catch (ServiceResponseException e) {
            e.printStackTrace();
            System.out.println(e.getHttpStatusCode());
            System.out.println(e.getRequestId());
            System.out.println(e.getErrorCode());
            System.out.println(e.getErrorMsg());
        }
    }
}

            

          

        
       

传入网络图片的base64编码进行文字识别，识别过程校验图片倾斜角度，判断待识别字体类型，并校验图片是否包含联系人信息

        
         
           
           
             # coding: utf-8

from huaweicloudsdkcore.auth.credentials import BasicCredentials
from huaweicloudsdkocr.v1.region.ocr_region import OcrRegion
from huaweicloudsdkcore.exceptions import exceptions
from huaweicloudsdkocr.v1 import *

if __name__ == "__main__":
    # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak = os.getenv("CLOUD_SDK_AK")
    sk = os.getenv("CLOUD_SDK_SK")

    credentials = BasicCredentials(ak, sk) \

    client = OcrClient.new_builder() \
        .with_credentials(credentials) \
        .with_region(OcrRegion.value_of("<YOUR REGION>")) \
        .build()

    try:
        request = RecognizeWebImageRequest()
        listExtractTypebody = [
            "contact_info",
            "image_size"
        ]
        request.body = WebImageRequestBody(
            detect_font=True,
            extract_type=listExtractTypebody,
            detect_direction=True,
            image="/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA..."
        )
        response = client.recognize_web_image(request)
        print(response)
    except exceptions.ClientRequestException as e:
        print(e.status_code)
        print(e.request_id)
        print(e.error_code)
        print(e.error_msg)

            

          

        
       

传入网络图片的url进行文字识别，识别过程校验图片倾斜角度，判断待识别字体类型，并校验图片是否包含联系人信息

        
         
           
           
             # coding: utf-8

from huaweicloudsdkcore.auth.credentials import BasicCredentials
from huaweicloudsdkocr.v1.region.ocr_region import OcrRegion
from huaweicloudsdkcore.exceptions import exceptions
from huaweicloudsdkocr.v1 import *

if __name__ == "__main__":
    # The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    # In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak = os.getenv("CLOUD_SDK_AK")
    sk = os.getenv("CLOUD_SDK_SK")

    credentials = BasicCredentials(ak, sk) \

    client = OcrClient.new_builder() \
        .with_credentials(credentials) \
        .with_region(OcrRegion.value_of("<YOUR REGION>")) \
        .build()

    try:
        request = RecognizeWebImageRequest()
        listExtractTypebody = [
            "contact_info",
            "image_size"
        ]
        request.body = WebImageRequestBody(
            detect_font=True,
            extract_type=listExtractTypebody,
            detect_direction=True,
            url="https://BucketName.obs.myhuaweicloud.com/ObjectName"
        )
        response = client.recognize_web_image(request)
        print(response)
    except exceptions.ClientRequestException as e:
        print(e.status_code)
        print(e.request_id)
        print(e.error_code)
        print(e.error_msg)

            

          

        
       

传入网络图片的base64编码进行文字识别，识别过程校验图片倾斜角度，判断待识别字体类型，并校验图片是否包含联系人信息

        
         
           
           
             package main

import (
	"fmt"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic"
    ocr "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/model"
    region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/region"
)

func main() {
    // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak := os.Getenv("CLOUD_SDK_AK")
    sk := os.Getenv("CLOUD_SDK_SK")

    auth := basic.NewCredentialsBuilder().
        WithAk(ak).
        WithSk(sk).
        Build()

    client := ocr.NewOcrClient(
        ocr.OcrClientBuilder().
            WithRegion(region.ValueOf("<YOUR REGION>")).
            WithCredential(auth).
            Build())

    request := &model.RecognizeWebImageRequest{}
	var listExtractTypebody = []string{
        "contact_info",
	    "image_size",
    }
	detectFontWebImageRequestBody:= true
	detectDirectionWebImageRequestBody:= true
	imageWebImageRequestBody:= "/9j/4AAQSkZJRgABAgEASABIAAD/4RFZRXhpZgAATU0AKgAAAA..."
	request.Body = &model.WebImageRequestBody{
		DetectFont: &detectFontWebImageRequestBody,
		ExtractType: &listExtractTypebody,
		DetectDirection: &detectDirectionWebImageRequestBody,
		Image: &imageWebImageRequestBody,
	}
	response, err := client.RecognizeWebImage(request)
	if err == nil {
        fmt.Printf("%+v\n", response)
    } else {
        fmt.Println(err)
    }
}

            

          

        
       

传入网络图片的url进行文字识别，识别过程校验图片倾斜角度，判断待识别字体类型，并校验图片是否包含联系人信息

        
         
           
           
             package main

import (
	"fmt"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/core/auth/basic"
    ocr "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1"
	"github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/model"
    region "github.com/huaweicloud/huaweicloud-sdk-go-v3/services/ocr/v1/region"
)

func main() {
    // The AK and SK used for authentication are hard-coded or stored in plaintext, which has great security risks. It is recommended that the AK and SK be stored in ciphertext in configuration files or environment variables and decrypted during use to ensure security.
    // In this example, AK and SK are stored in environment variables for authentication. Before running this example, set environment variables CLOUD_SDK_AK and CLOUD_SDK_SK in the local environment
    ak := os.Getenv("CLOUD_SDK_AK")
    sk := os.Getenv("CLOUD_SDK_SK")

    auth := basic.NewCredentialsBuilder().
        WithAk(ak).
        WithSk(sk).
        Build()

    client := ocr.NewOcrClient(
        ocr.OcrClientBuilder().
            WithRegion(region.ValueOf("<YOUR REGION>")).
            WithCredential(auth).
            Build())

    request := &model.RecognizeWebImageRequest{}
	var listExtractTypebody = []string{
        "contact_info",
	    "image_size",
    }
	detectFontWebImageRequestBody:= true
	detectDirectionWebImageRequestBody:= true
	urlWebImageRequestBody:= "https://BucketName.obs.myhuaweicloud.com/ObjectName"
	request.Body = &model.WebImageRequestBody{
		DetectFont: &detectFontWebImageRequestBody,
		ExtractType: &listExtractTypebody,
		DetectDirection: &detectDirectionWebImageRequestBody,
		Url: &urlWebImageRequestBody,
	}
	response, err := client.RecognizeWebImage(request)
	if err == nil {
        fmt.Printf("%+v\n", response)
    } else {
        fmt.Println(err)
    }
}

            

          

        
       

更多编程语言的SDK代码示例，请参见API Explorer的代码示例页签，可生成自动对应的SDK代码示例。

状态码

状态码	描述
200	成功响应示例
400	失败响应示例

状态码请参见状态码。

错误码

错误码请参见错误码。

父主题： API

上一篇：通用文字识别

下一篇：护照识别

意见反馈

文档内容是否对您有帮助？

有帮助没帮助

提供反馈

提交成功！非常感谢您的反馈，我们会继续努力做到更好！

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

直接提交取消