Can OCR Recognize Text-based Document Formats?
The core function of Huawei Cloud OCR is to detect and extract text from images. It is designed to work with static image data and cannot directly read or recognize text from structured document formats such as Word, PDF, or Excel files. These formats are fundamentally different from the "image-based" input that the OCR service is built to handle.
If you need to extract text from Word, PDF, or Excel files, a preprocessing step is required to convert each page of the document into a clear static image. Once the file has been converted into a supported image format, it can be uploaded via API or SDK following the standard OCR workflow to complete the text recognition process.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot