Contribution to the Automatic Recognition of Business Documents

Djamel Gaceb 1 Frank Lebourgeois 1 Véronique Eglin 1 Hubert Emptoz 1
1 imagine - Extraction de Caractéristiques et Identification
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : The automatic processing of paper documents and mails is a major challenge for all companies. Current recognition systems use modular architectures in which each stage of the process is independent. To improve the performances, it is necessary to reintroduce a cooperation between the different modules, for example by coupling the segmentation / recognition or zones of interests location / segmentation steps. In this context we propose a mixed approach for text localization and image segmentation which respects real time constraints. In the first part, we are going to present the state of the art in text location and thresholding in the images of postal addresses. In the second part, we will describe our method which simultaneously localize and segment text zones. The Location of text blocks obtained from a multiresolution approach on cumulated gradients computed directly from grey level images. The coupling of the two processes (text zones location and thresholding) allows to reduce simultaneously the computing time by processing only necessary parts of the image and by obtaining a better character segmentation for the OCR (Optical Character Recognition). We will present the results obtained from the implementation of our approach on an industrial line which daily processes several tons of documents from large companies.
Type de document :
Communication dans un congrès
Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, Oct 2006, La Baule (France), France. Suvisoft, 2006
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00104169
Contributeur : Anne Jaigu <>
Soumis le : vendredi 6 octobre 2006 - 08:47:39
Dernière modification le : jeudi 1 novembre 2018 - 01:19:54
Document(s) archivé(s) le : mardi 6 avril 2010 - 18:37:28

Identifiants

  • HAL Id : inria-00104169, version 1

Citation

Djamel Gaceb, Frank Lebourgeois, Véronique Eglin, Hubert Emptoz. Contribution to the Automatic Recognition of Business Documents. Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, Oct 2006, La Baule (France), France. Suvisoft, 2006. 〈inria-00104169〉

Partager

Métriques

Consultations de la notice

430

Téléchargements de fichiers

229