A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

J. M. Borwein and A. S. Lewis, Convex analysis and nonlinear optimization, 2006.

L. Bottou, Online algorithms and stochastic approximations, Online Learning and Neural Networks, 1998.

L. Bottou and O. Bousquet, The trade-offs of large scale learning, Adv. NIPS, 2008.

O. Cappé and E. Moulines, On-line expectation-maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.11, issue.3, pp.593-613, 2009.
DOI : 10.1111/j.1467-9868.2009.00698.x

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, J. Mach. Learn. Res, vol.10, pp.2899-2934, 2009.

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, LIBLINEAR: A library for large linear classification, J. Mach. Learn. Res, vol.9, pp.1871-1874, 2008.

G. Gasso, A. Rakotomamonjy, and S. Canu, Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming, IEEE Transactions on Signal Processing, vol.57, issue.12, pp.4686-4698, 2009.
DOI : 10.1109/TSP.2009.2026004

URL : https://hal.archives-ouvertes.fr/hal-00439453

S. Ghadimi and G. Lan, Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming, SIAM Journal on Optimization, vol.23, issue.4, 2013.
DOI : 10.1137/120880811

URL : http://arxiv.org/abs/1309.5549

E. Hazan, A. Agarwal, and S. Kale, Logarithmic regret algorithms for online convex optimization, Machine Learning, vol.73, issue.3, pp.169-192, 2007.
DOI : 10.1007/s10994-007-5016-8

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

E. Hazan and S. Kale, Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization, Proc. COLT, 2011.

C. Hu, J. Kwok, and W. Pan, Accelerated gradient methods for stochastic optimization and online learning, Adv. NIPS, 2009.

R. Jenatton, G. Obozinski, and F. Bach, Structured sparse principal component analysis, Proc. AIS- TATS, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00414158

G. Lan, An optimal method for stochastic composite optimization, Mathematical Programming, vol.24, issue.1-2, pp.365-397, 2012.
DOI : 10.1007/s10107-010-0434-y

K. Lange, D. R. Hunter, and I. Yang, Optimization Transfer Using Surrogate Objective Functions, Journal of Computational and Graphical Statistics, vol.68, issue.1, pp.1-20, 2000.
DOI : 10.1080/10618600.2000.10474858

J. Langford, L. Li, and T. Zhang, Sparse online learning via truncated gradient, J. Mach. Learn. Res, vol.10, pp.777-801, 2009.

N. , L. Roux, M. Schmidt, and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, Adv. NIPS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00674995

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, J. Mach. Learn. Res, vol.11, pp.19-60, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00408716

J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, Network flow algorithms for structured sparsity, Adv. NIPS, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00512556

R. M. Neal and G. E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 1998.

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277

URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, Gradient methods for minimizing composite objective functions, CORE Discussion Paper, 2007.
DOI : 10.1007/s10107-012-0629-5

S. Shalev-schwartz and T. Zhang, Proximal stochastic dual coordinate ascent, 1211.

S. Shalev-shwartz, O. Shamir, N. Srebro, and K. Sridharan, Stochastic convex optimization, Proc. COLT, 2009.

S. Shalev-shwartz and A. Tewari, Stochastic methods for ?1 regularized loss minimization, Proc. ICML, 2009.
DOI : 10.1145/1553374.1553493

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

A. W. Van and . Vaart, Asymptotic Statistics, 1998.

M. J. Wainwright and M. I. Jordan, Graphical Models, Exponential Families, and Variational Inference, Foundations and Trends?? in Machine Learning, vol.1, issue.1???2, pp.1-305, 2008.
DOI : 10.1561/2200000001

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Wright, R. Nowak, and M. Figueiredo, Sparse Reconstruction by Separable Approximation, IEEE Transactions on Signal Processing, vol.57, issue.7, pp.2479-2493, 2009.
DOI : 10.1109/TSP.2009.2016892

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res, vol.11, pp.2543-2596, 2010.

D. P. Bertsekas, Nonlinear programming, Athena Scientific Belmont, 1999.

S. P. Boyd and L. Vandenberghe, Convex Optimization, 2004.

D. L. Fisk, Quasi-martingales. T. Am, Math. Soc, vol.120, issue.3, pp.359-388, 1965.

M. Métivier, Semi-martingales, 1983.

Y. Nesterov, Introductory lectures on convex optimization, 2004.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov and J. Vial, Confidence level solutions for stochastic programming, Automatica, vol.44, issue.6, pp.1559-1568, 2008.
DOI : 10.1016/j.automatica.2008.01.017

J. Nocedal and S. J. Wright, Numerical optimization, 2006.
DOI : 10.1007/b98874