F. Bach and E. Moulines, Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning, Adv. NIPS, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Adv. NIPS, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

N. Bershad, Analysis of the normalized LMS algorithm with Gaussian inputs, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.34, issue.4, pp.793-806, 1986.
DOI : 10.1109/TASSP.1986.1164914

L. Bottou and Y. Le-cun, On-line learning for very large data sets, Applied Stochastic Models in Business and Industry, vol.14, issue.2, pp.137-151, 2005.
DOI : 10.1002/asmb.538

O. Bousquet and L. Bottou, The tradeoffs of large scale learning, Adv. NIPS, 2008.

V. Fabian, On Asymptotic Normality in Stochastic Approximation, The Annals of Mathematical Statistics, vol.39, issue.4, pp.1327-1332, 1968.
DOI : 10.1214/aoms/1177698258
URL : http://projecteuclid.org/download/pdf_1/euclid.aoms/1177698258

T. Kanamori and H. Shimodaira, Active learning algorithm using the maximum weighted log-likelihood estimator, Journal of Statistical Planning and Inference, vol.116, issue.1, pp.149-162, 2003.
DOI : 10.1016/S0378-3758(02)00234-3

O. Macchi, Adaptive processing: The least mean squares approach with applications in transmission, 1995.

A. Nedic and D. Bertsekas, Convergence rate of incremental subgradient algorithms. Stochastic Optimization: Algorithms and Applications, pp.263-304, 2000.

D. Needell, N. Srebro, and R. Ward, Stochastic gradient descent and the randomized kaczmarz algorithm, 2013.
DOI : 10.1007/s10107-015-0864-7
URL : http://arxiv.org/abs/1310.5715

A. S. Nemirovski and D. B. Yudin, Problem complexity and method efficiency in optimization, 1983.

Y. Nesterov, Introductory Lectures on Convex Optimization: a Basic Course, 2004.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov, Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems, SIAM Journal on Optimization, vol.22, issue.2, pp.341-362, 2012.
DOI : 10.1137/100802001

F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid, Towards good practice in large-scale learning for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248090
URL : https://hal.archives-ouvertes.fr/hal-00690014

B. T. Polyak and A. B. Juditsky, Acceleration of Stochastic Approximation by Averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.
DOI : 10.1137/0330046

D. Ruppert, Efficient estimations from a slowly convergent Robbins-Monro process, 1988.

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, 2013.
DOI : 10.1007/s10107-016-1030-6
URL : https://hal.archives-ouvertes.fr/hal-00860051

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, pp.567-599, 2013.

P. Toulis, J. Rennie, and A. M. Airoldi, Statistical analysis of stochastic gradient methods for generalized linear models, Proc. ICML, 2014.

P. Zhao and T. Zhang, Stochastic optimization with importance sampling, 2014.