Dhivehi OCR: Character Recognition of Thaana Script using Machine- Generated Text and Tesseract OCR Engine

Ahmed Ibrahim


This paper provides technical aspects and the context of recognising Dhivehi characters using Tesseract OCR Engine, which is a freely available OCR engine with remarkable accuracy and support for multiple languages. The experiments that were conducted showed promising results with 69.46% accuracy and, more importantly, highlighted limitations that are unique to Dhivehi. These issues have been discussed in detail and possible directions for future research are presented.

