Publication:
OCR error correction using correction patterns and self-organizing migrating algorithm
OCR error correction using correction patterns and self-organizing migrating algorithm
No Thumbnail Available
Files
Date
2020
Authors
Quoc-Dung Nguyen
Duc-Anh Le
Nguyet-Minh Phan
Ivan Zelinka
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Abstract
Optical character recognition (OCR) systems help to digitize paper-based historical achieves. However, poor quality of
scanned documents and limitations of text recognition techniques result in different kinds of errors in OCR outputs. Postprocessing
is an essential step in improving the output quality of OCR systems by detecting and cleaning the errors. In this
paper, we present an automatic model consisting of both error detection and error correction phases for OCR post-processing.
We propose a novel approach of OCR post-processing error correction using correction pattern edits and evolutionary
algorithm which has been mainly used for solving optimization problems. Our model adopts a variant of the self-organizing
migrating algorithm along with a fitness function based on modifications of important linguistic features. We illustrate
how to construct the table of correction pattern edits involving all types of edit operations and being directly learned from
the training dataset. Through efficient settings of the algorithm parameters, our model can be performed with high-quality
candidate generation and error correction. The experimental results show that our proposed approach outperforms various
baseline approaches as evaluated on the benchmark dataset of ICDAR 2017 Post-OCR text correction competition
Description
Keywords
OCR,
N-grams,
Similarity,
Context,
Correction pattern,
Evolutionary algorithm