HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm

Thomas Meilender 1 Abdel Belaïd 2
1 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 READ - READ
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper describes a segmentation method of continuous document flow. A document flow is a list of successive scanned pages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% of precision and 90% of recall.
Document type :
Conference papers
Complete list of metadata

https://hal.inria.fr/inria-00347217
Contributor : Abdel Belaid Connect in order to contact the contributor
Submitted on : Monday, December 15, 2008 - 10:52:30 AM
Last modification on : Friday, February 26, 2021 - 3:28:07 PM
Long-term archiving on: : Tuesday, June 8, 2010 - 5:10:18 PM

File

meilender-spie.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00347217, version 1

Collections

Citation

Thomas Meilender, Abdel Belaïd. Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm. SPIE - Electronic Imaging, 2009, Los Angeles, United States. ⟨inria-00347217⟩

Share

Metrics

Record views

104

Files downloads

239