No image available for this title

Text

Improving OCR Performance with Background Image Elimination



one critical procedure in OCR is to detect text characters from a document image. However, some documents might come with embedded background images which often mislead the algorithms of character detection. For example, small dots or sharp edges from the background image are often bound-boxed as characters and passed to the next stage of the OCR pipeline, which causes an error chain. Motivated by this observation, we present a novel and cost-effective image preprocessing method to accomplish the task. We first enhance the document images before OCR by utilizing the brightness and chromaticity as contrast parameters. Then we convert color images to gray and threshold it. This way, background images can be removed effectively without losing the quality of text characters. The method was tested using Tesseract (an open source OCR engine) and compared with two commercial OCR software ABBYY Finereader and HANWANG (OCR software for Chinese characters). The experimental results show that the recognition accuracies are improved significantly after removing background images.


Availability

PO000008IDperpusonline.idAvailable

Detail Information

Series Title
12th International Conference on Fuzzy Systems and Knowledge Discovery
Call Number
-
Publisher : .,
Collation
-
Language
English
ISBN/ISSN
-
Classification
NONE
Content Type
-
Media Type
-
Carrier Type
-
Edition
-
Subject(s)
Specific Detail Info
-
Statement of Responsibility

Other version/related

No other version available


File Attachment



Information


RECORD DETAIL


Back To PreviousXML DetailCite this