Mapping Transcripts to Handwritten Text

Abstract : In the analysis and recognition of handwriting, a useful first task is to assign ground truth for words in the writing. Such an assignment is useful for various subsequent machine learning tasks for performing automatic recognition, writer verification, etc. Since automatic word segmentation and recognition can be error prone, an intermediate approach is to use a text file that is a transcription of the handwriting image for performing ground truth assignment. This paper describes an algorithm for finding the best word level alignment between the transcript and the handwriting image. The algorithm is useful in tasks such as: (i) extracting words and characters as characteristic elements in writer verification and identification tasks; (ii) creating a large ground-truthed dataset for handwriting document analysis (in word and even character levels); (iii) indexing a collection of handwritten materials for document retrieval, such as for historical manuscripts. The algorithm achieves an 84.7% accuracy in aligning words on whole images when evaluated on 20 pages from a handwriting database created for forensic document examination studies.
Document type :
Conference papers
Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, Oct 2006, La Baule (France), Suvisoft, 2006


https://hal.inria.fr/inria-00112763
Contributor : Anne Jaigu <>
Submitted on : Thursday, November 9, 2006 - 4:15:37 PM
Last modification on : Thursday, November 9, 2006 - 4:52:39 PM

Identifiers

  • HAL Id : inria-00112763, version 1

Collections

Citation

Chen Huang, Sargur N. Srihari. Mapping Transcripts to Handwritten Text. Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, Oct 2006, La Baule (France), Suvisoft, 2006. <inria-00112763>

Export

Share

Metrics

Consultation de
la notice

72

Téléchargement du document

32