M. Achab, A. Guilloux, S. Gaïffas, and E. Bacry, SGD with Variance Reduction beyond Empirical Risk Minimization, 2015.

Z. Allen-zhu, Katyusha: The first direct acceleration of stochastic gradient methods, 2016.

Z. Allen-zhu, Y. Yuan, and K. Sridharan, Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters, Advances in Neural Information Processing Systems (NIPS), 2016.

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization Methods for Large-Scale Machine Learning, 2016.

J. Bruna and S. Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013.
DOI : 10.1109/TPAMI.2012.230
URL : http://arxiv.org/abs/1203.1513

A. Coates, H. Lee, and A. Y. Ng, An Analysis of Single-Layer Networks in Unsupervised Feature Learning, International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

A. Defazio, J. Domke, and T. S. Caetano, Finito: A faster, permutable incremental gradient method for big data problems, International Conference on Machine Learning (ICML), 2014.

J. C. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, vol.10, pp.2899-2934, 2009.

J. C. Duchi, M. I. Jordan, and M. J. Wainwright, Privacy Aware Learning, Advances in Neural Information Processing Systems (NIPS), 2012.
DOI : 10.1145/2666468

J. Hiriart-urruty and C. Lemaréchal, Convex analysis and minimization algorithms I: Fundamentals. Springer science & business media, 1993.
DOI : 10.1007/978-3-662-02796-7

T. Hofmann, A. Lucchi, S. Lacoste-julien, and B. Mcwilliams, Variance Reduced Stochastic Gradient Descent with Neighbors, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248672

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013.

M. Lacoste-julien, F. Schmidt, and . Bach, A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method. arXiv:1212, 2002.
URL : https://hal.archives-ouvertes.fr/hal-00768187

G. Lan and Y. Zhou, An optimal randomized incremental gradient method, 2015.

H. Lin, J. Mairal, and Z. Harchaoui, A Universal Catalyst for First-Order Optimization, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160728

G. Loosli, S. Canu, and L. Bottou, Training invariant support vector machines using selective sampling, Large Scale Kernel Machines, pp.301-320, 2007.

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng et al., Learning word vectors for sentiment analysis, The 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp.142-150, 2011.

N. Meinshausen and P. Bühlmann, Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.7, issue.4, pp.417-473, 2010.
DOI : 10.1111/j.1467-9868.2010.00740.x

R. Neal and G. E. Hinton, A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants, Learning in Graphical Models, pp.355-368, 1998.
DOI : 10.1007/978-94-011-5014-9_12

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277
URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, Introductory Lectures on Convex Optimization, 2004.
DOI : 10.1007/978-1-4419-8853-9

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00860051

S. Shalev-shwartz, SDCA without Duality, Regularization, and Individual Convexity, International Conference on Machine Learning (ICML), 2016.

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research, vol.14, issue.Feb, pp.567-599, 2013.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

M. J. Van-de-vijver, A Gene-Expression Signature as a Predictor of Survival in Breast Cancer, New England Journal of Medicine, vol.347, issue.25, pp.1999-2009, 2002.
DOI : 10.1056/NEJMoa021967

L. Van-der-maaten, M. Chen, S. Tyree, and K. Q. Weinberger, Learning with marginalized corrupted features, International Conference on Machine Learning (ICML), 2013.

S. Wager, W. Fithian, S. Wang, and P. Liang, Altitude Training: Strong Bounds for Single-layer Dropout, Advances in Neural Information Processing Systems (NIPS), 2014.

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research, vol.11, pp.2543-2596, 2010.

L. Xiao and T. Zhang, A Proximal Stochastic Gradient Method with Progressive Variance Reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.
DOI : 10.1137/140961791
URL : http://arxiv.org/abs/1403.4699