Publication:
OCR error correction for Vietnamese handwritten text using neural machine translation

datacite.subject.fos oecd::Engineering and technology
dc.contributor.author D. Q. Nguyen
dc.contributor.author A. D. Le
dc.contributor.author M. N. Phan
dc.contributor.author P. Kromer
dc.contributor.author I. Zelinka
dc.date.accessioned 2022-11-19T06:58:02Z
dc.date.available 2022-11-19T06:58:02Z
dc.date.issued 2021
dc.description.abstract OCR post-processing is an important step for improving the quality of OCR output texts. Long short-term memory (LSTM) is a deep learning model, which has wide-range applications in many domains like time series prediction, natural language processing and speech recognition. In this paper, we propose an OCR error correction model using neural machine translation with bidirectional LSTM networks at syllable level. Vietnamese OCR text dataset for the model evaluation is outputted from an OCR engine based on the attention-based encoder-decoder (AED) model taking input of handwritten text in the benchmark database of the ICFHR 2018 Vietnamese online handwritten text recognition competition. The experimental results show that the proposed model helps decrease the word error rate in the OCR output texts of the above AED model by about 2%. The model performance is also discussed and compared to the other baseline methods in the competition
dc.identifier.doi 10.1063/5.0066679
dc.identifier.uri http://repository.vlu.edu.vn:443/handle/123456789/1487
dc.language.iso en_US
dc.relation.ispartof 1ST VAN LANG INTERNATIONAL CONFERENCE ON HERITAGE AND TECHNOLOGY CONFERENCE PROCEEDING, 2021: VanLang-HeriTech, 2021
dc.relation.ispartof AIP Conference Proceedings
dc.relation.issn 0094-243X
dc.subject "Artificial neural networks
dc.subject Natural language processing
dc.subject Learning models
dc.subject Speech recognition"
dc.title OCR error correction for Vietnamese handwritten text using neural machine translation
dc.type proceedings-article
dspace.entity.type Publication
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
notepad.txt
Size:
0 B
Format:
Plain Text
Description: