Publication:
OCR Error Correction for Unconstrained Vietnamese Handwritten Text

datacite.subject.fos oecd::Natural sciences::Computer and information sciences
dc.contributor.author Quoc-Dung Nguyen
dc.contributor.author Duc-Anh Le
dc.contributor.author Ivan Zelinka
dc.date.accessioned 2022-11-03T08:56:54Z
dc.date.available 2022-11-03T08:56:54Z
dc.date.issued 2019
dc.description.abstract Post-processing is an essential step in detecting and correcting errors in OCR-generated texts. In this paper, we present an automatic OCR post-processing model which comprises both error detection and error correction phases for OCR output texts of unconstrained Vietnamese handwriting. We propose a hybrid approach of generating and scoring correction candidates for both non-syllable and real-syllable errors based on the linguistic features as well as the error characteristics of OCR outputs. We evaluate our proposed model on a Vietnamese benchmark database at the line level. The experimental results show that our model achieves 4.17% of character error rate (CER) and 9.82% of word error rate (WER), which helps improve both CER and WER of an attention-based encoder-decoder approach by 0.5% and 3.5% respectively on the VNOnDB-Line dataset of the Vietnamese online handwritten text recognition competition (VOHTR2018). These results outperform those obtained by various recognition systems in the VOHTR2018 competition.
dc.identifier.doi 10.1145/3368926.3369686
dc.identifier.uri http://repository.vlu.edu.vn:443/handle/123456789/841
dc.language.iso en_US
dc.relation.ispartof Proceedings of the Tenth International Symposium on Information and Communication Technology - SoICT 2019
dc.subject Unconstrained Vietnamese handwriting
dc.subject OCR
dc.subject Post-processing
dc.subject Error detection
dc.subject Error correction
dc.title OCR Error Correction for Unconstrained Vietnamese Handwritten Text
dc.type proceedings-article
dspace.entity.type Publication
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
AS383.pdf
Size:
800.8 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: