95 articles 

inria-00112708, version 1

Segmentation-Based And Segmentation-Free Methods for Spotting Handwritten Arabic Words

Gregory R. Ball () 1, Sargur N. Srihari () 1, Harish Srinivasan () 1

Tenth International Workshop on Frontiers in Handwriting Recognition (2006)

Abstract: Given a set of handwritten documents, a common goal is to search for a relevant subset. Attempting to find a query word or image in such a set of documents is called word spotting. Spotting handwritten words in documents written in the Latin alphabet, and more recently in Arabic, has received considerable attention. One issue is generating candidate word regions on a page. Attempting to definitely segment the document into such regions (automatic segmentation) can meet with some success, but the performance of such an algorithm is often a limiting factor in spotting performance. Another approach is to directly scan the image on the page without attempting to generate such a definite segmentation. A new algorithm for word spotting and a comparison of recent algorithms which act on previously unsegmented Arabic handwritten text is presented. The algorithms considered are an automated word segmentation method presented previously and a “segmentation free” algorithm which performs spotting directly on lines of unsegmented text. The segmentation free approach performs spotting and segmentation concurrently using a sliding window. The spotting method used to judge the performance of the algorithms is a character based method, but the results are independent of the actual spotting method used. The segmentation-free method performs an average of 5-10% better than the automated segmentation method, and manages to have a lower per query cost on unprocessed images. However, it has a larger per query cost on preprocessed documents.

  • 1:  Center of Excellence for Document Analysis and Recognition (CEDAR)
  • State University of New York at Buffalo
  • Domain : Computer Science/Document and Text Processing
    Computer Science/Computer Vision and Pattern Recognition
  • Keywords : Segmentation – word spotting – Arabic – scanned document search – word recognition
  • Comment : http://www.suvisoft.com
  • inria-00112708, version 1
  • oai:hal.inria.fr:inria-00112708
  • From: 
  • Submitted on: Thursday, 9 November 2006 15:40:53
  • Updated on: Thursday, 9 November 2006 16:51:52