Fast imbalanced binary classification: a moment-based approach

Edouard Grave 1, 2, 3 Laurent El Ghaoui 1
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : In this paper, we consider the problem of imbalanced binary classification in which the number of negative examples is much larger than the number of positive examples. The two mainstream methods to deal with such problems are to assign different weights to negative and positive points or to subsample points from the negative class. In this paper, we propose a different approach: we represent the negative class by the two first moments of its probability distribution (the mean and the covariance), while still modeling the positive class by individual examples. Therefore, our formulation does not depend on the number of negative examples, making it suitable to highly imbalanced problems and scalable to large datasets. We demonstrate empirically, on a protein classification task and a text classification task, that our approach achieves similar statistical performance than the two mainstream approaches to imbalanced classification problems, while being more computationally efficient.
Type de document :
Pré-publication, Document de travail
2014
Liste complète des métadonnées

https://hal.inria.fr/hal-01087452
Contributeur : Edouard Grave <>
Soumis le : mercredi 26 novembre 2014 - 20:01:21
Dernière modification le : vendredi 25 mai 2018 - 12:02:06
Document(s) archivé(s) le : vendredi 27 février 2015 - 11:06:26

Fichiers

tech_report.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01087452, version 1

Collections

Citation

Edouard Grave, Laurent El Ghaoui. Fast imbalanced binary classification: a moment-based approach. 2014. 〈hal-01087452〉

Partager

Métriques

Consultations de la notice

564

Téléchargements de fichiers

3410