Z. Allen-zhu, Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter, International conference on machine learning (ICML), 2017.

Z. Allen-zhu and E. Hazan, Variance reduction for faster non-convex optimization, International conference on machine learning (ICML), 2016.

J. M. Borwein and A. S. Lewis, Convex analysis and nonlinear optimization: theory and examples, 2006.

Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford, Accelerated methods for non-convex optimization, 2016.

Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford, Lower bounds for finding stationary points I. preprint arXiv, pp.1710-11606, 2017.

Y. Carmon, O. Hinder, J. C. Duchi, and A. Sidford, convex until proven guilty " : Dimension-free acceleration of gradient descent on non-convex functions, International conference on machine learning (ICML), 2017.

C. Cartis, N. I. Gould, and P. L. Toint, On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems, SIAM Journal on Optimization, vol.20, issue.6, pp.2833-2852, 2010.
DOI : 10.1137/090774100

C. Cartis, N. I. Gould, and P. L. Toint, On the complexity of finding first-order critical points in constrained nonlinear optimization, Mathematical Programming, vol.3, issue.1, pp.93-106, 2014.
DOI : 10.1137/0803004

A. J. Defazio, F. Bach, and S. Lacoste-julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

D. Drusvyatskiy and C. Paquette, Efficiency of minimizing compositions of convex functions and smooth maps, 2016.

J. C. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research (JMLR), vol.12, pp.2121-2159, 2011.

S. Ghadimi and G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Mathematical Programming, pp.59-99, 2016.
DOI : 10.1002/0471722138

URL : http://arxiv.org/pdf/1310.3787

S. Ghadimi, G. Lan, H. Zhang, T. Hastie, R. Tibshirani et al., Generalized uniformly optimal methods for nonlinear programming Statistical learning with sparsity: the Lasso and generalizations, 2015.

C. Jin, P. Netrapalli, and M. I. Jordan, Accelerated gradient descent escapes saddle points faster than gradient descent, 2017.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations (ICLR), 2015.

G. Lan and Y. Zhou, An optimal randomized incremental gradient method, Mathematical Programming , Series A, pp.1-38, 2017.
DOI : 10.1007/s10107-014-0839-0

URL : http://arxiv.org/pdf/1507.02000

L. Lei and M. I. Jordan, Less than a single pass: stochastically controlled stochastic gradient method, Conference on Artificial Intelligence and Statistics (AISTATS), 2017.

L. Lei, C. Ju, J. Chen, and M. I. Jordan, Nonconvex finite-sum optimization via SCSG methods, Advances in Neural Information Processing Systems (NIPS), 2017.

H. Li and Z. Lin, Accelerated proximal gradient methods for nonconvex programming, Advances in Neural Information Processing Systems (NIPS), 2015.

H. Lin, J. Mairal, and Z. Harchaoui, A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160728

J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, pp.85-283, 2014.
DOI : 10.1561/0600000058

URL : https://hal.archives-ouvertes.fr/hal-01081139

Y. Nesterov, A method of solving a convex programming problem with convergence rate, Soviet Mathematics Doklady, vol.27, issue.1 22, pp.372-376, 1983.

Y. Nesterov, Introductory lectures on convex optimization: a basic course, 2004.
DOI : 10.1007/978-1-4419-8853-9

M. O. Neill and S. J. Wright, Behavior of accelerated gradient methods near critical points of nonconvex problems, 2017.

C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui, Catalyst acceleration for gradient-based non-convex optimization, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01536017

S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola, Stochastic variance reduction for nonconvex optimization, International conference on machine learning (ICML), 2016.

S. J. Reddi, S. Sra, B. Poczos, and A. J. Smola, Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, Advances in Neural Information Processing Systems (NIPS), 2016.

R. T. Rockafellar and R. J. Wets, Variational analysis, of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences, 1998.
DOI : 10.1007/978-3-642-02431-3

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.24, issue.2, pp.83-112, 2017.
DOI : 10.1137/140961791

URL : https://hal.archives-ouvertes.fr/hal-00860051

B. E. Woodworth and N. Srebro, Tight complexity bounds for optimizing composite objectives, Advances in Neural Information Processing Systems (NIPS), 2016.

L. Xiao and T. Zhang, A Proximal Stochastic Gradient Method with Progressive Variance Reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.
DOI : 10.1137/140961791

URL : http://arxiv.org/pdf/1403.4699

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.5, issue.2, pp.301-320, 2005.
DOI : 10.1073/pnas.201162998