Publication:
OCR Error Correction for Unconstrained Vietnamese Handwritten Text
OCR Error Correction for Unconstrained Vietnamese Handwritten Text
datacite.subject.fos | oecd::Natural sciences::Computer and information sciences | |
dc.contributor.author | Quoc-Dung Nguyen | |
dc.contributor.author | Duc-Anh Le | |
dc.contributor.author | Ivan Zelinka | |
dc.date.accessioned | 2022-11-03T08:56:54Z | |
dc.date.available | 2022-11-03T08:56:54Z | |
dc.date.issued | 2019 | |
dc.description.abstract | Post-processing is an essential step in detecting and correcting errors in OCR-generated texts. In this paper, we present an automatic OCR post-processing model which comprises both error detection and error correction phases for OCR output texts of unconstrained Vietnamese handwriting. We propose a hybrid approach of generating and scoring correction candidates for both non-syllable and real-syllable errors based on the linguistic features as well as the error characteristics of OCR outputs. We evaluate our proposed model on a Vietnamese benchmark database at the line level. The experimental results show that our model achieves 4.17% of character error rate (CER) and 9.82% of word error rate (WER), which helps improve both CER and WER of an attention-based encoder-decoder approach by 0.5% and 3.5% respectively on the VNOnDB-Line dataset of the Vietnamese online handwritten text recognition competition (VOHTR2018). These results outperform those obtained by various recognition systems in the VOHTR2018 competition. | |
dc.identifier.doi | 10.1145/3368926.3369686 | |
dc.identifier.uri | http://repository.vlu.edu.vn:443/handle/123456789/841 | |
dc.language.iso | en_US | |
dc.relation.ispartof | Proceedings of the Tenth International Symposium on Information and Communication Technology - SoICT 2019 | |
dc.subject | Unconstrained Vietnamese handwriting | |
dc.subject | OCR | |
dc.subject | Post-processing | |
dc.subject | Error detection | |
dc.subject | Error correction | |
dc.title | OCR Error Correction for Unconstrained Vietnamese Handwritten Text | |
dc.type | proceedings-article | |
dspace.entity.type | Publication |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- AS383.pdf
- Size:
- 800.8 KB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed to upon submission
- Description: