A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research

Saeed Mozaffari () 1, Karim Faez () 1, Farhad Faradji () 1, Majid Ziaratban () 1, S. Mohamad Golzan () 1

Tenth International Workshop on Frontiers in Handwriting Recognition (2006)

Abstract: This paper presents a new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research. The database is freely available for academic use. So far no such a freely database in Farsi language is available. Grayscale images of 52,380 characters and 17,740 numerals are included. Each image was scanned from Iranian school entrance exam forms during the years 2004-2006 at 300 dpi. The only restriction imposed on the writers is to write each character within a rectangular box. The number of samples in each class of the database is non-uniform corresponding to their real life distributions. Also, for comparison purposes, each dataset has been properly divided into respective training and test sets.

