Creating a Video OCR Job

Function

Create a video OCR job to recognize and extract key details or illegal content by analyzing the text content in a video.

  • Supported video formats: AVI, WMV, MPG, MPEG, MP4, MOV, M4V, MKV
  • Videos cannot be stored in OBS buckets encrypted by KMS.
  • The size of a single video cannot exceed 4 GB.
  • When data is read from a specified URL, the video size cannot exceed 1 GB.
  • Numbers, English subtitles, and simplified and traditional Chinese characters can all be identified.
  • Horizontal and vertical text can be recognized, as well as many unclear or artistic fonts, but text arranged into a circle or viewed from a severe angle are typically not handled well.
  • The video resolution must be at least 300 x 300 pixels.
  • The video frame rate must be greater than 1 fps.
  • Supported regions: CN North-Beijing1 and CN North-Beijing4.

URI

  • URI format:
    POST /v2/{project_id}/services/video-ocr/tasks
  • Parameters

    Parameter

    Mandatory

    Type

    Description

    project_id

    Yes

    String

    Project ID corresponding to the region where the service is located. For details about how to obtain the project ID, see Obtaining the Project ID.

Request

  • Example request
    POST /v2/6204a5bd270343b5885144cf9c8c158d/services/video-ocr/tasks 
     {   
         "name": "task-est",   
         "description": "description",   
         "input": {   
                "type": "obs",   
                "data": [   
                       {   
                         "bucket": "obs-iva",   
                         "path": "input/demo.mp4"   
                       }
                ]   
         },   
         "output": {   
                "obs": {   
                       "bucket": "obs-iva",   
                       "path": "output/"   
                }   
         },   
         "service_config": {   
                "common": {   
                       "area": "0,0,0.5,0.5;"   
                       "threshold": 0.5   
                }   
         },   
        "service_version":"1.0"
     }
  • Parameters

    Parameter

    Mandatory

    Type

    Description

    nname

    Yes

    String

    Job name, which consists of 1 to 100 characters, including letters (A to Z and a to z), digits (0 to 9), hyphens (-), and underscores (_).

    description

    No

    String

    Job description, which consists of a maximum of 500 characters.

    input

    Yes

    Object

    List of input video data. Currently, only the following input types are supported:
    • obs: Read video data from OBS on HUAWEI CLOUD. The video size cannot exceed 4 GB. Videos encrypted by KMS cannot be stored in OBS buckets.
    • url: Read video data from a specified URL. The video size cannot exceed 1 GB. Currently, only the URL of OBS is supported. You need to grant the anonymous users the permission to read the URL. For details, see Bucket ACL Overview.

    For details, see task.input.

    output

    Yes

    Object

    List of output result data. Currently, only the following output types are supported:
    • obs: Export the results to the OBS bucket you specified.
    • hosting: Host the results on OBS at the service side. The OBS path is specified by the service. You can obtain the path by using the API that queries a single task. For details, see Querying a Single Job.

    For details, see task.output.

    service_config

    No

    Object

    Service algorithm configuration. The field structure is related to the service. For details about the parameter definition, see serviceConfig field structure description.

    service_version

    Yes

    String

    Version, which is set to 1.0.

  • service_config field structure description

    Parameter

    Mandatory

    Type

    Description

    area

    No

    String

    Selected area for text recognition. Use semicolons (;) to divide different areas. The first two values of each area represent the percentage coordinates of (x, y) in the upper left corner of the area. The last two digits indicate the percentage of the width and height of the selected area. This value ranges from 0 to 1. There is no default value.

    threshold

    No

    Float

    A confidence threshold for the text that is outputted. A higher threshold means a more accurate output and a lower extraction rate. The value ranges from 0 to 1.00, with a default value of 0.50.

Response Parameters

  • Example response
    [  
      {      "id": "f18320e61e4c4dc685aa2dfc22a28dc5"   }  
    ]
  • Job ID description

    Parameter

    Type

    Description

    id

    String

    Job ID

Recognition Results

The video analysis results are saved in JSON format in the output path you specified.

  • Example result file
    {
        "name": "obs-wxh/demo.mp4",
        "fps": 15,
        "contents": [
            {
                "time_start": "00:00:00",
                "time_end": "00:00:01",
                "content": [
    "Cloud Boosting Business Innovation"
    "Digital Painting Future"
    "2018 World Artificial Intelligence Conference"
                ]
            },
            {
                "time_start": "00:00:01",
                "time_end": "00:00:02",
                "content": [
    "Cloud Boosting Business Innovation"
    "Digital Painting Future"
    "Changes Brought by AI Have Just Begun"
                ]
            },
            {
                "time_start": "00:00:03",
                "time_end": "00:00:04",
                "content": [
    "Phase 1"
    "Technical Productivity for General Purpose"
    "Application Development Curve"
                ]
            }
      ]
    }
  • Fields in the result file

    Field

    Description

    name

    Video name

    fps

    Frame rate

    time_start

    Content start time

    time_end

    Content end time

    content

    Text identified

Status Codes

  • Normal

    201

  • Abnormal

    Status Code

    Description

    400 Bad Request

    Request error. For details about the returned error code, see Error Codes.

    401 Unauthorized

    Authentication failed.

    403 Forbidden

    No operation permission.

    404 Not Found

    The requested resource was not found.

    500 Internal Server Error

    Internal service error.

    503 Service Unavailable

    Service unavailable.