Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Multipage Administrative Document Stream Segmentation

Abstract : We propose in this paper a framework for the segmentation and classification of document streams. The framework is composed of two modules: segmentation and verification. The two modules use an incremental classifier which learns progressively along the stream. In the segmentation module a relationship between two consecutive pages is classified as either: continuity or rupture. Rupture is synonymous of a clear break, thus probably a complete document. If the classifier is uncertain on whether the relationship should be a continuity or a rupture, an over-segmentation is proposed and we consider that we have a fragment i.e. portion of a document. Both fragments and documents are sent to the verification module where additionally to the incremental classifier it includes a correction module. The classifier predicts the classes of fragments and documents. The predicted class represents a context which is used as a query to search for similar contexts in the correction module and correct the segmentation and verification results. Corrections are sent back to the segmentation and verification modules to learn the correct classes. Results on real world databases show the effectiveness and stability of our approach.
Document type :
Conference papers
Complete list of metadata
Contributor : Abdel Belaid Connect in order to contact the contributor
Submitted on : Tuesday, January 12, 2016 - 4:38:56 PM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM




Hani Daher, Mohamed-Rafik Bouguelia, Belaïd Abdel, Vincent Poulain d'Andecy. Multipage Administrative Document Stream Segmentation. ICPR 2014 - 22nd International Conference on Pattern Recognition, Aug 2014, Stokholm, Sweden. pp.966 - 971 ⟨10.1109/ICPR.2014.176⟩. ⟨hal-01254785⟩



Record views