M. Achab, A. Guilloux, S. Gaïffas, and E. Bacry, SGD with Variance Reduction beyond Empirical Risk Minimization, 2015.

Z. Allen-zhu, Katyusha: The first direct acceleration of stochastic gradient methods, 2016.

Z. Allen-zhu, Y. Yuan, and K. Sridharan, Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters, Advances in Neural Information Processing Systems (NIPS), 2016.

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization Methods for Large-Scale Machine Learning, 2016.

J. Bruna and S. Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013.
DOI : 10.1109/TPAMI.2012.230
URL : http://arxiv.org/abs/1203.1513

A. Coates, H. Lee, and A. Y. Ng, An Analysis of Single-Layer Networks in Unsupervised Feature Learning, International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

A. Defazio, J. Domke, and T. S. Caetano, Finito: A faster, permutable incremental gradient method for big data problems, International Conference on Machine Learning (ICML), 2014.

J. C. Duchi, M. I. Jordan, and M. J. Wainwright, Privacy Aware Learning, Advances in Neural Information Processing Systems (NIPS), 2012.
DOI : 10.1145/2666468

J. C. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research (JMLR), vol.10, pp.2899-2934, 2009.

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.90
URL : http://arxiv.org/abs/1512.03385

J. Hiriart-urruty and C. Lemaréchal, Convex analysis and minimization algorithms I: Fundamentals. Springer science & business media, 1993.
DOI : 10.1007/978-3-662-02796-7

T. Hofmann, A. Lucchi, S. Lacoste-julien, and B. Mcwilliams, Variance Reduced Stochastic Gradient Descent with Neighbors, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248672

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013.

S. Lacoste-julien, M. Schmidt, and F. Bach, A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method. arXiv:1212, 2002.
URL : https://hal.archives-ouvertes.fr/hal-00768187

G. Lan and Y. Zhou, An optimal randomized incremental gradient method, 2015.

H. Lin, J. Mairal, and Z. Harchaoui, A Universal Catalyst for First-Order Optimization, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160728

G. Loosli, S. Canu, and L. Bottou, Training invariant support vector machines using selective sampling, Large Scale Kernel Machines, pp.301-320, 2007.

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng et al., Learning word vectors for sentiment analysis, The 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp.142-150, 2011.

N. Meinshausen and P. Bühlmann, Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.7, issue.4, pp.417-473, 2010.
DOI : 10.1111/j.1467-9868.2010.00740.x

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277
URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, Introductory Lectures on Convex Optimization, 2004.
DOI : 10.1007/978-1-4419-8853-9

M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, and C. Schmid, Transformation Pursuit for Image Classification, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.466
URL : https://hal.archives-ouvertes.fr/hal-00979464

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.24, issue.2, pp.83-112, 2017.
DOI : 10.1007/s10107-016-1030-6
URL : https://hal.archives-ouvertes.fr/hal-00860051

S. Shalev-shwartz, SDCA without Duality, Regularization, and Individual Convexity, International Conference on Machine Learning (ICML), 2016.

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research, vol.14, pp.567-599, 2013.

P. Y. Simard, Y. A. Lecun, J. S. Denker, and B. Victorri, Transformation Invariance in Pattern Recognition ? Tangent Distance and Tangent Propagation, Neural Networks: Tricks of the Trade, number 1524 in Lecture Notes in Computer Science, pp.239-274, 1998.
DOI : 10.1007/3-540-49430-8_13
URL : https://hal.archives-ouvertes.fr/halshs-00009505

M. J. Van-de-vijver, A Gene-Expression Signature as a Predictor of Survival in Breast Cancer, New England Journal of Medicine, vol.347, issue.25, pp.1999-2009, 2002.
DOI : 10.1056/NEJMoa021967

L. Van-der-maaten, M. Chen, S. Tyree, and K. Q. Weinberger, Learning with marginalized corrupted features, International Conference on Machine Learning (ICML), 2013.

S. Wager, W. Fithian, S. Wang, and P. Liang, Altitude Training: Strong Bounds for Single-layer Dropout, Advances in Neural Information Processing Systems (NIPS), 2014.

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research (JMLR), vol.11, pp.2543-2596, 2010.

L. Xiao and T. Zhang, A Proximal Stochastic Gradient Method with Progressive Variance Reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.
DOI : 10.1137/140961791
URL : http://arxiv.org/abs/1403.4699

S. Zheng, Y. Song, T. Leung, and I. Goodfellow, Improving the Robustness of Deep Neural Networks via Stability Training, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.485