Sentence Boundary Detection for Handwritten Text Recognition

Abstract : In the larger context of handwritten text recognition systems many natural language processing techniques can potentially be applied to the output of such systems. However, these techniques often assume that the input is segmented into meaningful units, such as sentences. This paper investigates the use of hidden-event language models and a maximum entropy based method for sentence boundary detection. While hidden-event language models are simple to train, the maximum entropy framework allows for an easy integration of various knowledge sources. The segmentation performance of these two approaches are evaluated on the IAM Database for handwritten English text and results on true words as well as recognized words are provided. Finally, a combination of the two techniques is shown to achieve superior performance over both individual methods.
Type de document :
Communication dans un congrès
Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, Oct 2006, La Baule (France), Suvisoft, 2006
Liste complète des métadonnées

https://hal.inria.fr/inria-00103835
Contributeur : Anne Jaigu <>
Soumis le : jeudi 5 octobre 2006 - 12:43:48
Dernière modification le : jeudi 5 octobre 2006 - 14:23:14
Document(s) archivé(s) le : mardi 6 avril 2010 - 18:25:17

Identifiants

  • HAL Id : inria-00103835, version 1

Collections

Citation

Matthias Zimmermann. Sentence Boundary Detection for Handwritten Text Recognition. Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, Oct 2006, La Baule (France), Suvisoft, 2006. 〈inria-00103835〉

Partager

Métriques

Consultations de la notice

57

Téléchargements de fichiers

79