95 articles 

inria-00104358, version 1

Word-wise Hand-written Script Separation for Indian Postal automation

K. Roy 1, U. Pal () 2

Tenth International Workshop on Frontiers in Handwriting Recognition (2006)

Abstract: In a multi-lingual multi-script country like India, a postal document may contain words of two or more scripts. For recognition of this document it is necessary to separate different scripts from the document. In this paper, an automatic scheme for word-wise identification of hand-written Roman and Oriya scripts is proposed for Indian postal automation. In the proposed scheme, at first, document skew is corrected. Next, using a piecewise projection method the document is segmented into lines and then lines into words. Finally, using different features like, water reservoir concept based features, fractal dimension based features, topological features, scripts characteristics based features etc., a Neural Network (NN) classifier is used for word-wise script identification. For experiment we consider 2500 words and overall accuracy of 97.69% is obtained from the proposed identification scheme.

  • 1:  West Bengal University of Technology (WBUTECH)
  • West Bengal University
  • 2:  Computer Vision and Pattern Recognition Unit (CVPR)
  • Indian Statistical Institute
  • Domain : Computer Science/Document and Text Processing
    Computer Science/Computer Vision and Pattern Recognition
  • Keywords : Script separation – Indian script – Multilingual OCR – Handwritten recognition
  • Comment : http://www.suvisoft.com
 
  • inria-00104358, version 1
  • oai:hal.inria.fr:inria-00104358
  • From: 
  • Submitted on: Friday, 6 October 2006 13:07:21
  • Updated on: Friday, 6 October 2006 13:23:40