Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

Stéphan Clémençon 1 Igor Colin 1 Aurélien Bellet 2
2 MAGNET - Machine Learning in Information Networks
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by U-statistics of degree d ≥ 1, i.e. functionals of the training data with low variance that take the form of averages over k-tuples. From a computational perspective, the calculation of such statistics is highly expensive even for a moderate sample size n, as it requires averaging O(n^d) terms. This makes learning procedures relying on the optimization of such data functionals hardly feasible in practice. It is the major goal of this paper to show that, strikingly, such empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the O(1/√n) learning rate of Empirical Risk Minimization (ERM) procedures. For this purpose, we establish uniform deviation results describing the error made when approximating a U-process by its incomplete version under appropriate complexity assumptions. Extensions to model selection, fast rate situations and various sampling techniques are also considered , as well as an application to stochastic gradient descent for ERM. Finally, numerical examples are displayed in order to provide strong empirical evidence that the approach we promote largely surpasses more naive subsampling techniques.
Type de document :
Article dans une revue
Journal of Machine Learning Research (JMLR), 2016, 17 (76), pp.1-36. 〈〉
Liste complète des métadonnées

Littérature citée [47 références]  Voir  Masquer  Télécharger
Contributeur : Aurélien Bellet <>
Soumis le : lundi 6 juin 2016 - 23:25:37
Dernière modification le : mardi 3 juillet 2018 - 11:21:35


Fichiers éditeurs autorisés sur une archive ouverte


  • HAL Id : hal-01327662, version 1
  • ARXIV : 1501.02629


Stéphan Clémençon, Igor Colin, Aurélien Bellet. Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics. Journal of Machine Learning Research (JMLR), 2016, 17 (76), pp.1-36. 〈〉. 〈hal-01327662〉



Consultations de la notice


Téléchargements de fichiers