A. Agarwal, M. J. Wainwright, P. L. Bartlett, and P. K. Ravikumar, Information-theoretic lower bounds on the oracle complexity of convex optimization, IEEE Transactions on Information Theory, vol.58, issue.5, pp.3235-3249, 2012.

Z. Allen-zhu, Katyusha: The first direct acceleration of stochastic gradient methods, Proceedings of Symposium on Theory of Computing (STOC, 2017.

Y. Arjevani and O. Shamir, Dimension-free iteration complexity of finite sum optimization problems, Advances in Neural Information Processing Systems (NIPS), 2016.

M. Baes, Estimate sequence methods: extensions and approximations. ETH technical report, 2009.

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.

A. Bietti and J. Mairal, Stochastic optimization with variance reduction for infinite datasets with finite-sum structure, Advances in Neural Information Processing Systems (NIPS), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01375816

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, vol.60, issue.2, pp.223-311, 2018.

D. Csiba, Z. Qu, and P. Richtárik, Stochastic dual coordinate ascent with adaptive probabilities, International Conference on Machine Learning (ICML), 2015.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for nonstrongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

A. Defazio, T. Caetano, and J. Domke, Finito: A faster, permutable incremental gradient method for big data problems, Proceedings of the International Conferences on Machine Learning (ICML), 2014.

S. Ghadimi and G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework, SIAM Journal on Optimization, vol.22, issue.4, pp.1469-1492, 2012.

S. Ghadimi and G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization II: Shrinking procedures and optimal algorithms, SIAM Journal on Optimization, vol.23, issue.4, pp.2061-2089, 2013.

J. Hiriart-urruty and C. Lemaréchal, Convex analysis and minimization algorithms. II, 1996.

T. Hofmann, A. Lucchi, S. Lacoste-julien, and B. Mcwilliams, Variance reduced stochastic gradient descent with neighbors, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248672

C. Hu, W. Pan, and J. T. Kwok, Accelerated gradient methods for stochastic optimization and online learning, Advances in Neural Information Processing Systems (NIPS), 2009.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013.

G. Lan, An optimal method for stochastic composite optimization, Mathematical Programming, vol.133, issue.1, pp.365-397, 2012.

G. Lan and Y. Zhou, An optimal randomized incremental gradient method, Mathematical Programming, vol.171, issue.1-2, pp.167-215, 2018.

H. Lin, J. Mairal, and Z. Harchaoui, Catalyst acceleration for first-order convex optimization: from theory to practice, Journal of Machine Learning Research (JMLR), vol.18, issue.212, pp.1-54, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01664934

J. , Incremental majorization-minimization optimization with application to large-scale machine learning, SIAM Journal on Optimization, vol.25, issue.2, pp.829-855, 2015.

J. , End-to-end kernel learning with supervised convolutional kernel networks, Advances in Neural Information Processing Systems (NIPS), 2016.

J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, vol.8, pp.85-283, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01081139

N. Meinshausen and P. Bühlmann, Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.72, issue.4, pp.417-473, 2010.

J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, CR Acad. Sci. Paris Sér. A Math, vol.255, pp.2897-2899, 1962.
URL : https://hal.archives-ouvertes.fr/hal-01867195

J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, vol.93, issue.2, pp.273-299, 1965.

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k 2 ), Soviet Mathematics Doklady, vol.27, issue.2, pp.372-376, 1983.

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, 2004.

Y. Nesterov, Gradient methods for minimizing composite functions, Mathematical Programming, vol.140, issue.1, pp.125-161, 2013.

Y. Nesterov and B. T. Polyak, Cubic regularization of Newton method and its global performance, Mathematical Programming, vol.108, issue.1, pp.177-205, 2006.

A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, Advances in Neural Information Processing Systems (NIPS), 2014.

M. Schmidt, R. Babanezhad, M. Ahmed, A. Defazio, A. Clifton et al., Non-uniform stochastic average gradient method for training conditional random fields, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.

M. Schmidt, N. Le-roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.162, issue.1-2, pp.83-112, 2017.
URL : https://hal.archives-ouvertes.fr/hal-00860051

S. Shalev-shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Mathematical Programming, vol.155, issue.1, pp.105-145, 2016.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

V. Vapnik, The nature of statistical learning theory, 2000.

M. J. Wainwright, M. I. Jordan, and J. C. Duchi, Privacy aware learning, Advances in Neural Information Processing Systems (NIPS), 2012.

L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.