Multipage Administrative Document Stream Segmentation

Abstract : We propose in this paper a framework for the segmentation and classification of document streams. The framework is composed of two modules: segmentation and verification. The two modules use an incremental classifier which learns progressively along the stream. In the segmentation module a relationship between two consecutive pages is classified as either: continuity or rupture. Rupture is synonymous of a clear break, thus probably a complete document. If the classifier is uncertain on whether the relationship should be a continuity or a rupture, an over-segmentation is proposed and we consider that we have a fragment i.e. portion of a document. Both fragments and documents are sent to the verification module where additionally to the incremental classifier it includes a correction module. The classifier predicts the classes of fragments and documents. The predicted class represents a context which is used as a query to search for similar contexts in the correction module and correct the segmentation and verification results. Corrections are sent back to the segmentation and verification modules to learn the correct classes. Results on real world databases show the effectiveness and stability of our approach.
Type de document :
Communication dans un congrès
ICPR 2014 - 22nd International Conference on Pattern Recognition, Aug 2014, Stokholm, Sweden. pp.966 - 971 〈10.1109/ICPR.2014.176〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01254785
Contributeur : Abdel Belaid <>
Soumis le : mardi 12 janvier 2016 - 16:38:56
Dernière modification le : mardi 24 avril 2018 - 13:30:31

Identifiants

Collections

Citation

Hani Daher, Mohamed-Rafik Bouguelia, Belaïd Abdel, Vincent Poulain d'Andecy. Multipage Administrative Document Stream Segmentation. ICPR 2014 - 22nd International Conference on Pattern Recognition, Aug 2014, Stokholm, Sweden. pp.966 - 971 〈10.1109/ICPR.2014.176〉. 〈hal-01254785〉

Partager

Métriques

Consultations de la notice

111