Updated on 2025-05-13 GMT+08:00

What Is OCR?

Optical Character Recognition (OCR) detects and extracts text from images and converts the recognition results into an editable JSON format.

OCR provides open APIs, so you can use programming languages such as Python and Java to call OCR APIs to extract text from images. OCR allows you to automate the collection of key data. It helps you build an intelligent service system to improve efficiency. For details about how to obtain APIs, see Optical Character Recognition API Reference.

OCR also provides software development kits (SDKs) for multiple programming languages. For details about how to use SDKs, see the Optical Character Recognition SDK Reference.

Before You Start

You will need some basic programming skills. Familiarity with Java, Python, iOS, Android, and Node.js is recommended.

You need to call APIs to use OCR and transmit the results to the service system, or to convert the results from JSON to TXT or Excel form.

OCR Capabilities

  • General OCR

    General OCR supports automated text recognition in images of any format, including web images, enabling quick digitization of different types of documents.

  • Card OCR

    Card OCR enables automatic recognition and structured extraction of key information from passports, ID cards, driver's licenses, and other official credentials.

Using OCR for the First Time

If you are a first-time user, the following sections are a good place to start: