Skip to Main content Skip to Navigation
Conference papers

A Stream-Based Semi-Supervised Active Learning Approach for Document Classification

Mohamed-Rafik Bouguelia 1 Yolande Belaïd 1 Abdel Belaïd 1 
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We consider an industrial context where we deal with a stream of unlabelled documents that become available progressively over time. Based on an adaptive incremental neural gas algorithm (AING), we propose a new stream-based semisupervised active learning method (A2ING) for document classification, which is able to actively query (from a human annotator) the class-labels of documents that are most informative for learning, according to an uncertainty measure. The method maintains a model as a dynamically evolving graph topology of labelled document-representatives that we call neurons. Experiments on different real datasets show that the proposed method requires on average only 36.3% of the incoming documents to be labelled, in order to learn a model which achieves an average gain of 2.15-3.22% in precision, compared to the traditional supervised learning with fully labelled training documents.
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download
Contributor : Yolande Belaid Connect in order to contact the contributor
Submitted on : Thursday, August 29, 2013 - 9:59:22 AM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM
Long-term archiving on: : Monday, December 2, 2013 - 8:53:41 AM


Publisher files allowed on an open archive




Mohamed-Rafik Bouguelia, Yolande Belaïd, Abdel Belaïd. A Stream-Based Semi-Supervised Active Learning Approach for Document Classification. 12th International Conference on Document Analysis and Recognition - ICDAR 2013, Aug 2013, Washington, United States. pp.611-615, ⟨10.1109/ICDAR.2013.126⟩. ⟨hal-00855184⟩



Record views


Files downloads