S. Ahn, J. A. Fessler, D. Blatt, and A. O. Hero, Convergent incremental optimization transfer algorithms: Application to tomography, IEEE Trans. Med. Imaging, pp.25-283, 2006.

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with Sparsity-Inducing Penalties, Foundations and Trends?? in Machine Learning, vol.4, issue.1, pp.1-106, 2012.
DOI : 10.1561/2200000015

URL : https://hal.archives-ouvertes.fr/hal-00613125

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.3271

A. Beck and L. Tetruashvili, On the Convergence of Block Coordinate Descent Type Methods, SIAM Journal on Optimization, vol.23, issue.4, pp.2037-2060, 2013.
DOI : 10.1137/120887679

D. Blatt, A. O. Hero, and H. Gauchman, A Convergent Incremental Gradient Method with a Constant Step Size, SIAM Journal on Optimization, vol.18, issue.1, pp.29-51, 2007.
DOI : 10.1137/040615961

D. Böhning and B. G. Lindsay, Monotonicity of quadratic-approximation algorithms, Annals of the Institute of Statistical Mathematics, vol.11, issue.4, pp.641-663, 1988.
DOI : 10.1007/BF00049423

J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples, 2006.

L. Bottou, Online algorithms and stochastic approximations, Online Learning and Neural Networks, 1998.

S. P. Boyd and L. Vandenberghe, Convex Optimization, 2004.

E. J. Candès, M. Wakin, and S. P. Boyd, Enhancing Sparsity by Reweighted ??? 1 Minimization, Journal of Fourier Analysis and Applications, vol.7, issue.3, pp.877-905, 2008.
DOI : 10.1007/s00041-008-9045-x

M. Collins, R. E. Schapire, and Y. Singer, Logistic regression, AdaBoost and Bregman distances, Machine Learning, vol.48, issue.1/3, pp.253-285, 2002.
DOI : 10.1023/A:1013912006537

P. L. Combettes and J. Pesquet, Proximal Splitting Methods in Signal Processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, 2010.
DOI : 10.1007/978-1-4419-9569-8_10

URL : https://hal.archives-ouvertes.fr/hal-00643807

P. L. Combettes and V. R. Wajs, Signal Recovery by Proximal Forward-Backward Splitting, Multiscale Modeling & Simulation, vol.4, issue.4, pp.1168-1200, 2005.
DOI : 10.1137/050626090

URL : https://hal.archives-ouvertes.fr/hal-00017649

I. Daubechies, M. Defrise, and C. Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics, vol.58, issue.11, pp.1413-1457, 2004.
DOI : 10.1002/cpa.20042

A. J. Defazio, F. Bach, and S. Lacoste-julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Proceedings of Advances in Neural Information Processing Systems, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

A. J. Defazio, T. S. Caetano, and J. Domke, Finito: A faster, permutable incremental gradient method for big data problems, Proceedings of ICML, 2014.

S. , D. Pietra, V. D. Pietra, and J. Lafferty, Duality and Auxiliary Functions for Bregman Distances, 2001.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B, pp.39-40, 1977.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res, vol.12, pp.2121-2159, 2011.

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, J. Mach. Learn. Res, vol.10, pp.2899-2934, 2009.

H. Erdogan and J. A. Fessler, Ordered subsets algorithms for transmission tomography, Physics in Medicine and Biology, vol.44, issue.11, pp.2835-2851, 1999.
DOI : 10.1088/0031-9155/44/11/311

URL : https://deepblue.lib.umich.edu/bitstream/2027.42/48964/2/m91111.pdf

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, LIBLINEAR: A library for large linear classification, J. Mach. Learn. Res, vol.9, pp.1871-1874, 2008.

M. Fashing and C. Tomasi, Mean shift is a bound optimization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.3, pp.471-474, 2005.
DOI : 10.1109/TPAMI.2005.59

G. Gasso, A. Rakotomamonjy, and S. Canu, Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming, IEEE Transactions on Signal Processing, vol.57, issue.12, pp.4686-4698, 2009.
DOI : 10.1109/TSP.2009.2026004

URL : https://hal.archives-ouvertes.fr/hal-00439453

S. Ghadimi and G. Lan, Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework, SIAM Journal on Optimization, vol.22, issue.4, pp.1469-1492, 2012.
DOI : 10.1137/110848864

E. T. Hale, W. Yin, and Y. Zhang, Fixed-Point Continuation for $\ell_1$-Minimization: Methodology and Convergence, SIAM Journal on Optimization, vol.19, issue.3, pp.1107-1130, 2008.
DOI : 10.1137/070698920

E. Hazan and S. Kale, Beyond the regret minimization barrier: An optimal algorithm for stochastic strongly-convex optimization, Proceedings of COLT, 2011.

R. Horst and N. V. Thoai, DC Programming: Overview, Journal of Optimization Theory and Applications, vol.1, issue.1, pp.1-43, 1999.
DOI : 10.1023/A:1021765131316

T. Jebara and A. Choromanska, Majorization for CRFs and latent likelihoods, Proceedings of Advances in Neural Information Processing Systems, 2012.

A. Juditsky and A. Nemirovski, First order methods for nonsmooth convex large-scale optimization, Optimization for Machine Learning, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00981863

E. Khan, B. Marlin, G. Bouchard, and K. Murphy, Variational bounds for mixed-data factor analysis, Proceedings of Advances in Neural Information Processing Systems, 2010.

G. Lan, An optimal method for stochastic composite optimization, Mathematical Programming, vol.24, issue.1-2, pp.365-397, 2012.
DOI : 10.1007/s10107-010-0434-y

K. Lange, D. R. Hunter, and I. Yang, Optimization Transfer Using Surrogate Objective Functions, Journal of Computational and Graphical Statistics, vol.68, issue.1, pp.1-20, 2000.
DOI : 10.1080/10618600.2000.10474858

N. , L. Roux, M. Schmidt, and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, Proceedings of Advances in Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00674995

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, Proceedings of Advances in Neural Information Processing Systems, 2001.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, J. Mach. Learn. Res, vol.11, pp.19-60, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00408716

J. J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, C. R. Acad. Sci. Paris Sér. A Math, vol.255, pp.2897-2899, 1962.

R. M. Neal and G. E. Hinton, A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants, Learning in Graphical Models, pp.355-368, 1998.
DOI : 10.1007/978-94-011-5014-9_12

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277

URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, Introductory Lectures on Convex Optimization, 2004.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov, Gradient methods for minimizing composite functions, Mathematical Programming, vol.51, issue.1, pp.125-161, 2012.
DOI : 10.1007/s10107-012-0629-5

J. Nocedal and S. J. Wright, Numerical Optimization, 2006.
DOI : 10.1007/b98874

M. Razaviyayn, M. Hong, and Z. Luo, A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization, SIAM Journal on Optimization, vol.23, issue.2, pp.1126-1153, 2013.
DOI : 10.1137/120891009

M. Razaviyayn, M. Sanjabi, and Z. Luo, A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization, 2013.

M. Schmidt, N. L. Roux, and F. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, Proceedings of Advances in Neural Information Processing Systems, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00618152

M. Schmidt, N. L. Roux, and F. Bach, Minimizing Finite Sums with the Stochastic Average Gradient, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00860051

B. A. Turlach, W. N. Venables, and S. J. Wright, Simultaneous Variable Selection, Technometrics, vol.47, issue.3, pp.47-349, 2005.
DOI : 10.1198/004017005000000139

M. J. Wainwright and M. I. Jordan, Graphical Models, Exponential Families, and Variational Inference, Foundations and Trends?? in Machine Learning, vol.1, issue.1???2, pp.1-305, 2008.
DOI : 10.1561/2200000001

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.192.2462

S. J. Wright, R. D. Nowak, and M. A. Figueiredo, Sparse Reconstruction by Separable Approximation, IEEE Transactions on Signal Processing, vol.57, issue.7, pp.2479-2493, 2009.
DOI : 10.1109/TSP.2009.2016892

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.9334

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res, vol.11, pp.2543-2596, 2010.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006.
DOI : 10.1198/016214502753479356

L. W. Zhong and J. T. Kwok, Fast stochastic alternating direction method of multipliers, Proceedings of ICML, 2014.