Skip to Main content Skip to Navigation
Theses

Classification et apprentissage actif à partir d'un flux de données évolutif en présence d'étiquetage incertain

Mohamed-Rafik Bouguelia 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This thesis focuses on machine learning for data classification. To reduce the labelling cost, active learning allows to query the class label of only some important instances from a human labeller. We propose a new uncertainty measure that characterizes the importance of data and improves the performance of active learning compared to the existing uncertainty measures. This measure determines the smallest instance weight to associate with new data, so that the classifier changes its prediction concerning this data. We then consider a setting where the data arrives continuously from an infinite length stream. We propose an adaptive uncertainty threshold that is suitable for active learning in the streaming setting and achieves a compromise between the number of classification errors and the number of required labels. The existing stream-based active learning methods are initialized with some labelled instances that cover all possible classes. However, in many applications, the evolving nature of the stream implies that new classes can appear at any time. We propose an effective method of active detection of novel classes in a multi-class data stream. This method incrementally maintains a feature space area which is covered by the known classes, and detects those instances that are self-similar and external to that area as novel classes. Finally, it is often difficult to get a completely reliable labelling because the human labeller is subject to labelling errors that reduce the performance of the learned classifier. This problem was solved by introducing a measure that reflects the degree of disagreement between the manually given class and the predicted class, and a new informativeness measure that expresses the necessity for a mislabelled instance to be re-labeled by an alternative labeller.
Document type :
Theses
Complete list of metadata

Cited literature [123 references]  Display  Hide  Download

https://hal.inria.fr/tel-01262775
Contributor : Mohamed-Rafik Bouguelia <>
Submitted on : Friday, January 29, 2016 - 7:40:37 PM
Last modification on : Friday, January 15, 2021 - 5:42:02 PM
Long-term archiving on: : Friday, November 11, 2016 - 5:37:26 PM

Identifiers

  • HAL Id : tel-01262775, version 1

Citation

Mohamed-Rafik Bouguelia. Classification et apprentissage actif à partir d'un flux de données évolutif en présence d'étiquetage incertain. Intelligence artificielle [cs.AI]. Université de Lorraine, 2015. Français. ⟨tel-01262775⟩

Share

Metrics

Record views

491

Files downloads

4556