Skip to Main content Skip to Navigation
Conference papers

Classification active de flux de documents avec identification des nouvelles classes

Mohamed-Rafik Bouguelia 1 Yolande Belaïd 1 Abdel Belaïd 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this paper, we propose a stream-based semi-supervised active learning method for document classification, which is able to query (from an operator) the class labels of documents that are informative, according to an uncertainty measure. The method maintains a dynamically evolving graph topology of labelled document-representatives, which constitutes a covered feature space. The method is able to automatically discover the emergence of novel classes in the stream. An incoming document is identified as a member of a novel class or an existing class, depending on whether it is outside or inside the area covered by the known classes. Experiments on different real datasets show that the proposed method requires a small amount of the incoming documents to be labelled, in order to learn a model which achieves better or equal accuracy than to the usual supervised methods with fully labelled training documents.
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download

https://hal.inria.fr/hal-00980698
Contributor : Yolande Belaid <>
Submitted on : Friday, April 18, 2014 - 3:54:20 PM
Last modification on : Friday, January 15, 2021 - 5:42:02 PM
Long-term archiving on: : Monday, April 10, 2017 - 3:45:04 PM

File

CIFED_version_editeur.pdf
Explicit agreement for this submission

Identifiers

  • HAL Id : hal-00980698, version 1

Collections

Citation

Mohamed-Rafik Bouguelia, Yolande Belaïd, Abdel Belaïd. Classification active de flux de documents avec identification des nouvelles classes. CIFED - Colloque International Francophone sur l'Écrit et le Document, Mar 2014, Nancy, France. pp.75-89. ⟨hal-00980698⟩

Share

Metrics

Record views

912

Files downloads

364