Sentence Boundary Detection for Handwritten Text Recognition
Résumé
In the larger context of handwritten text recognition systems many natural language processing techniques can potentially be applied to the output of such systems. However, these techniques often assume that the input is segmented into meaningful units, such as sentences. This paper investigates the use of hidden-event language models and a maximum entropy based method for sentence boundary detection. While hidden-event language models are simple to train, the maximum entropy framework allows for an easy integration of various knowledge sources. The segmentation performance of these two approaches are evaluated on the IAM Database for handwritten English text and results on true words as well as recognized words are provided. Finally, a combination of the two techniques is shown to achieve superior performance over both individual methods.