A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research

Abstract : This paper presents a new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research. The database is freely available for academic use. So far no such a freely database in Farsi language is available. Grayscale images of 52,380 characters and 17,740 numerals are included. Each image was scanned from Iranian school entrance exam forms during the years 2004-2006 at 300 dpi. The only restriction imposed on the writers is to write each character within a rectangular box. The number of samples in each class of the database is non-uniform corresponding to their real life distributions. Also, for comparison purposes, each dataset has been properly divided into respective training and test sets.
Document type :
Conference papers
Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, Oct 2006, La Baule (France), Suvisoft, 2006


https://hal.inria.fr/inria-00112676
Contributor : Anne Jaigu <>
Submitted on : Thursday, November 9, 2006 - 3:13:20 PM
Last modification on : Thursday, November 9, 2006 - 4:50:07 PM

Identifiers

  • HAL Id : inria-00112676, version 1

Collections

Citation

Saeed Mozaffari, Karim Faez, Farhad Faradji, Majid Ziaratban, S. Mohamad Golzan. A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research. Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, Oct 2006, La Baule (France), Suvisoft, 2006. <inria-00112676>

Export

Share

Metrics

Consultation de
la notice

541

Téléchargement du document

406