A generic method for structure recognition of handwritten mail documents

Aurélie Lemaitre 1 Jean Camillerapp 1 Bertrand Coüasnon 1
1 IMADOC - Interprétation et Reconnaissance d’Images et de Documents
UR1 - Université de Rennes 1, INSA Rennes - Institut National des Sciences Appliquées - Rennes, CNRS - Centre National de la Recherche Scientifique : UMR6074
Abstract : This paper presents a system to extract the logical structure of handwritten mail documents. It consists in two joined tasks: the segmentation of documents into blocks and the labeling of such blocks. The main considered label classes are: addressee details, sender details, date, subject, text body, signature. This work has to face with difficulties of unconstrained handwritten documents: variable structure and writing. We propose a method based on a geometric analysis of the arrangement of elements in the document. We give a description of the document using a two-dimension grammatical formalism, which makes it possible to easily introduce knowledge on mail into a generic parser. Our grammatical parser is LL(k), which means several combinations are tried before extracting the good one. The main interest of this approach is that we can deal with low structured documents. Moreover, as the segmentation into blocks often depends on the associated classes, our method is able to retry a different segmentation until labeling succeeds. We validated this method in the context of the French national project RIMES, which proposed a contest on a large base of documents. We obtain a recognition rate of 91.7% on 1150 images.
Type de document :
Communication dans un congrès
Document Recognition and Retrieval DRR XV, Jan 2008, San Jose, United States. 2008
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00308565
Contributeur : Aurélie Lemaitre <>
Soumis le : jeudi 2 décembre 2010 - 17:38:58
Dernière modification le : jeudi 11 janvier 2018 - 06:20:08
Document(s) archivé(s) le : jeudi 3 mars 2011 - 02:25:21

Fichier

DRR_Lemaitre_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00308565, version 1

Citation

Aurélie Lemaitre, Jean Camillerapp, Bertrand Coüasnon. A generic method for structure recognition of handwritten mail documents. Document Recognition and Retrieval DRR XV, Jan 2008, San Jose, United States. 2008. 〈inria-00308565〉

Partager

Métriques

Consultations de la notice

215

Téléchargements de fichiers

54