Help Center/ Optical Character Recognition/ FAQs/ General Consulting/ Can OCR Recognize Text-based Document Formats?
Updated on 2025-10-23 GMT+08:00

Can OCR Recognize Text-based Document Formats?

The core function of Huawei Cloud OCR is to detect and extract text from images. It is designed to work with static image data and cannot directly read or recognize text from structured document formats such as Word, PDF, or Excel files. These formats are fundamentally different from the "image-based" input that the OCR service is built to handle.

If you need to extract text from Word, PDF, or Excel files, a preprocessing step is required to convert each page of the document into a clear static image. Once the file has been converted into a supported image format, it can be uploaded via API or SDK following the standard OCR workflow to complete the text recognition process.