Conference Proceedings - 2021

Permanent URI for this collection

http://repository.vlu.edu.vn:443/handle/123456789/1422

Browse

Candidate word generation for OCR errors using optimization algorithm

( 2021)
D. T. Pham
;
D. Q. Nguyen
;
A. D. Le
;
M. N. Phan
;
P. Kromer

OCR post-processing is an important step to improve OCR text accuracy. It includes two main tasks, error detection and error correction. Hill climbing algorithm is a heuristic search method used for solving optimization problems. In this paper, we present a novel OCR error correction approach using an adapted version of the Hill climbing algorithm. Correction candidates of OCR errors are explored by random character edits and evolved with the Hill climbing. The character edit patterns are obtained from the training data. The proposed model is evaluated on the benchmark dataset in the OCR post-correction competition of the International Conference on Document Analysis and Recognition 2017. It is shown that our model outperforms various baseline approaches in the competition. In addition, the randomness of the proposed algorithm is analyzed to verify its stability under parameter configurations.

Citations
Citation Indexes: 3
Captures
Readers: 2
see details
OCR error correction for Vietnamese handwritten text using neural machine translation

( 2021)
D. Q. Nguyen
;
A. D. Le
;
M. N. Phan
;
P. Kromer
;
I. Zelinka

OCR post-processing is an important step for improving the quality of OCR output texts. Long short-term memory (LSTM) is a deep learning model, which has wide-range applications in many domains like time series prediction, natural language processing and speech recognition. In this paper, we propose an OCR error correction model using neural machine translation with bidirectional LSTM networks at syllable level. Vietnamese OCR text dataset for the model evaluation is outputted from an OCR engine based on the attention-based encoder-decoder (AED) model taking input of handwritten text in the benchmark database of the ICFHR 2018 Vietnamese online handwritten text recognition competition. The experimental results show that the proposed model helps decrease the word error rate in the OCR output texts of the above AED model by about 2%. The model performance is also discussed and compared to the other baseline methods in the competition

Citations
Citation Indexes: 3
Captures
Readers: 3
see details

Browse

Browsing Conference Proceedings - 2021 by Author "A. D. Le"

Results Per Page

Sort Options