Evaluation of an Optical Character Recognition Model For Yoruba Text

No Thumbnail Available

Date

2019-01

Journal Title

Journal ISSN

Volume Title

Publisher

Tibiscus University

Abstract

The optical character recognition (OCR) for different languages has been developed and in use with diverse applications over the years. The development of OCR enables the digitization of paper document that would have been neglected over a period of time as well as serving as a form of backup for those documents. The system proposed is for isolated characters of Yoruba language. Yoruba language is a tonal language that carries accent on the vowel alphabets. The process used involves image gray scal, binarization, de-skew, and segmentation. Thus, the OCR enable the system read the images and convert them to text data. The proposed model was evaluated using the information retrieval metrics: Precision and Recall. Results showed a significant performance with a recall of 100% in the sample document used, and precision results that varies between 76%, 97%, and 100% in the sample document.The optical character recognition (OCR) for different languages has been developed and in use with diverse applications over the years. The development of OCR enables the digitization of paper document that would have been neglected over a period of time as well as serving as a form of backup for those documents. The system proposed is for isolated characters of Yoruba language. Yoruba language is a tonal language that carries accent on the vowel alphabets. The process used involves image gray scal, binarization, de-skew, and segmentation. Thus, the OCR enable the system read the images and convert them to text data. The proposed model was evaluated using the information retrieval metrics: Precision and Recall. Results showed a significant performance with a recall of 100% in the sample document used, and precision results that varies between 76%, 97%, and 100% in the sample document.

Description

Keywords

Recognition, Binarization, Accuracy, Image digitization

Citation

Collections