A Stream-Based Semi-Supervised Active Learning Approach for Document Classification

Mohamed-Rafik Bouguelia 1 Yolande Belaïd 1 Abdel Belaïd 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We consider an industrial context where we deal with a stream of unlabelled documents that become available progressively over time. Based on an adaptive incremental neural gas algorithm (AING), we propose a new stream-based semisupervised active learning method (A2ING) for document classification, which is able to actively query (from a human annotator) the class-labels of documents that are most informative for learning, according to an uncertainty measure. The method maintains a model as a dynamically evolving graph topology of labelled document-representatives that we call neurons. Experiments on different real datasets show that the proposed method requires on average only 36.3% of the incoming documents to be labelled, in order to learn a model which achieves an average gain of 2.15-3.22% in precision, compared to the traditional supervised learning with fully labelled training documents.
Type de document :
Communication dans un congrès
12th International Conference on Document Analysis and Recognition - ICDAR 2013, Aug 2013, Washington, United States. IEEE, pp.611-615, 2013, 〈10.1109/ICDAR.2013.126〉
Liste complète des métadonnées

Littérature citée [11 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00855184
Contributeur : Yolande Belaid <>
Soumis le : jeudi 29 août 2013 - 09:59:22
Dernière modification le : jeudi 11 janvier 2018 - 06:25:25
Document(s) archivé(s) le : lundi 2 décembre 2013 - 08:53:41

Fichier

ICDAR_2013_A2ING_version_edite...
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Mohamed-Rafik Bouguelia, Yolande Belaïd, Abdel Belaïd. A Stream-Based Semi-Supervised Active Learning Approach for Document Classification. 12th International Conference on Document Analysis and Recognition - ICDAR 2013, Aug 2013, Washington, United States. IEEE, pp.611-615, 2013, 〈10.1109/ICDAR.2013.126〉. 〈hal-00855184〉

Partager

Métriques

Consultations de la notice

360

Téléchargements de fichiers

869