Skip to Main content Skip to Navigation
Conference papers

Multipage Administrative Document Stream Segmentation

Abstract : We propose in this paper a framework for the segmentation and classification of document streams. The framework is composed of two modules: segmentation and verification. The two modules use an incremental classifier which learns progressively along the stream. In the segmentation module a relationship between two consecutive pages is classified as either: continuity or rupture. Rupture is synonymous of a clear break, thus probably a complete document. If the classifier is uncertain on whether the relationship should be a continuity or a rupture, an over-segmentation is proposed and we consider that we have a fragment i.e. portion of a document. Both fragments and documents are sent to the verification module where additionally to the incremental classifier it includes a correction module. The classifier predicts the classes of fragments and documents. The predicted class represents a context which is used as a query to search for similar contexts in the correction module and correct the segmentation and verification results. Corrections are sent back to the segmentation and verification modules to learn the correct classes. Results on real world databases show the effectiveness and stability of our approach.
Document type :
Conference papers
Complete list of metadata

https://hal.inria.fr/hal-01254785
Contributor : Abdel Belaid <>
Submitted on : Tuesday, January 12, 2016 - 4:38:56 PM
Last modification on : Friday, January 15, 2021 - 5:42:02 PM

Identifiers

Collections

Citation

Hani Daher, Mohamed-Rafik Bouguelia, Belaïd Abdel, Vincent Poulain d'Andecy. Multipage Administrative Document Stream Segmentation. ICPR 2014 - 22nd International Conference on Pattern Recognition, Aug 2014, Stokholm, Sweden. pp.966 - 971 ⟨10.1109/ICPR.2014.176⟩. ⟨hal-01254785⟩

Share

Metrics

Record views

178