Publication:
An In Depth Analysis of Ocr Errors for Unconstrained Vietnamese Handwriting

datacite.subject.fos oecd::Engineering and technology
dc.contributor.author Nguyễn Quốc Dũng
dc.date.accessioned 2022-11-09T09:49:05Z
dc.date.available 2022-11-09T09:49:05Z
dc.date.issued 2020
dc.description.abstract OCR post-processing is an essential step to improve the accuracy of OCR-generated texts by detecting and correcting OCR errors. In this paper, the OCR texts are resulted from an OCR engine which is based on the attention-based encoder-decoder model for unconstrained Vietnamese handwriting. We identify various kinds of Vietnamese OCR errors and their possible causes. Detailed statistics of Vietnamese OCR errors are provided and analyzed at both character level and syllable level, using typical OCR error characteristics such as error rate, error mapping/edit, frequency and error length. Furthermore, the statistical analyses are done on training and test sets of a benchmark database to infer whether the test set is the appropriate representative of the training set regarding the OCR error characteristics. We also discuss the choice of designing OCR post-processing approaches at character level or at syllable level relying on provided statistics of studied datasets.
dc.identifier.doi 10.1007/978-3-030-63924-2_26
dc.identifier.isbn 9783030639235
dc.identifier.isbn 9783030639242
dc.identifier.uri http://repository.vlu.edu.vn:443/handle/123456789/1112
dc.language.iso en_US
dc.relation.ispartof Future Data and Security Engineering
dc.relation.ispartof Lecture Notes in Computer Science
dc.relation.issn 0302-9743
dc.relation.issn 1611-3349
dc.subject OCR errors
dc.subject OCR post-processing
dc.subject Vietnamese handwriting
dc.subject Encoder
dc.subject Decoder
dc.subject Attention model
dc.title An In Depth Analysis of Ocr Errors for Unconstrained Vietnamese Handwriting
dc.type Resource Types::text::journal::journal article
dspace.entity.type Publication
oairecerif.author.affiliation #PLACEHOLDER_PARENT_METADATA_VALUE#
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
notepad.txt
Size:
0 B
Format:
Plain Text
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: