A Stream-Based Semi-Supervised Active Learning Approach for Document Classification - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

A Stream-Based Semi-Supervised Active Learning Approach for Document Classification

Résumé

We consider an industrial context where we deal with a stream of unlabelled documents that become available progressively over time. Based on an adaptive incremental neural gas algorithm (AING), we propose a new stream-based semisupervised active learning method (A2ING) for document classification, which is able to actively query (from a human annotator) the class-labels of documents that are most informative for learning, according to an uncertainty measure. The method maintains a model as a dynamically evolving graph topology of labelled document-representatives that we call neurons. Experiments on different real datasets show that the proposed method requires on average only 36.3% of the incoming documents to be labelled, in order to learn a model which achieves an average gain of 2.15-3.22% in precision, compared to the traditional supervised learning with fully labelled training documents.
Fichier principal
Vignette du fichier
ICDAR_2013_A2ING_version_editeur.pdf (730.78 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-00855184 , version 1 (29-08-2013)

Identifiants

Citer

Mohamed-Rafik Bouguelia, Yolande Belaïd, Abdel Belaïd. A Stream-Based Semi-Supervised Active Learning Approach for Document Classification. 12th International Conference on Document Analysis and Recognition - ICDAR 2013, Aug 2013, Washington, United States. pp.611-615, ⟨10.1109/ICDAR.2013.126⟩. ⟨hal-00855184⟩
214 Consultations
916 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More