HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk

Guillaume Papa 1 Stéphan Clémençon 1 Aurélien Bellet 2
2 MAGNET - Machine Learning in Information Networks
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tuples (e.g., pairs or triplets) of observations, rather than over individual observations. In this paper, we focus on how to best implement a stochastic approximation approach to solve such risk minimization problems. We argue that in the large-scale setting, gradient estimates should be obtained by sampling tuples of data points with replacement (incomplete U-statistics) instead of sampling data points without replacement (complete U-statistics based on subsamples). We develop a theoretical framework accounting for the substantial impact of this strategy on the generalization ability of the prediction model returned by the Stochastic Gradient Descent (SGD) algorithm. It reveals that the method we promote achieves a much better trade-off between statistical accuracy and computational cost. Beyond the rate bound analysis, experiments on AUC maximization and metric learning provide strong empirical evidence of the superiority of the proposed approach.
Complete list of metadata

Cited literature [26 references]  Display  Hide  Download

Contributor : Aurélien Bellet Connect in order to contact the contributor
Submitted on : Thursday, November 5, 2015 - 1:38:31 PM
Last modification on : Wednesday, March 23, 2022 - 3:51:21 PM
Long-term archiving on: : Friday, April 28, 2017 - 5:02:13 AM


Files produced by the author(s)


  • HAL Id : hal-01214667, version 1


Guillaume Papa, Stéphan Clémençon, Aurélien Bellet. SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk. Annual Conference on Neural Information Processing Systems (NIPS), Dec 2015, Montréal, Canada. ⟨hal-01214667⟩



Record views


Files downloads