Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm

Thomas Meilender 1 Abdel Belaïd 2
1 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 READ - READ
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper describes a segmentation method of continuous document flow. A document flow is a list of successive scanned pages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% of precision and 90% of recall.
Type de document :
Communication dans un congrès
SPIE - Electronic Imaging, 2009, Los Angeles, United States. 2009
Liste complète des métadonnées

https://hal.inria.fr/inria-00347217
Contributeur : Abdel Belaid <>
Soumis le : lundi 15 décembre 2008 - 10:52:30
Dernière modification le : jeudi 11 janvier 2018 - 06:19:59
Document(s) archivé(s) le : mardi 8 juin 2010 - 17:10:18

Fichier

meilender-spie.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00347217, version 1

Collections

Citation

Thomas Meilender, Abdel Belaïd. Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm. SPIE - Electronic Imaging, 2009, Los Angeles, United States. 2009. 〈inria-00347217〉

Partager

Métriques

Consultations de la notice

208

Téléchargements de fichiers

237