E. Arthur, R. W. Hoerl, and . Kennard, Ridge regression: applications to nonorthogonal problems, Technometrics, vol.12, issue.1, pp.69-82, 1970.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, vol.58, issue.1, pp.267-288, 1996.

M. Kowalski, Sparse regression using mixed norms, Applied and Computational Harmonic Analysis, vol.27, issue.3, pp.303-324, 2009.
DOI : 10.1016/j.acha.2009.05.006

URL : https://hal.archives-ouvertes.fr/hal-00202904

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optmization with sparsity-inducing penalties, Machine Learning, pp.1-106, 2012.

R. Jenatton, G. Obozinski, and F. Bach, Active set algorithm for structured sparsity-inducing norms, OPT 2009: 2nd NIPS Workshop on Optimization for Machine Learning, 2009.

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

R. Gribonval, V. Cevher, M. Davies, and E. , Compressible Distributions for Highdimensional Statistics, IEEE Transactions on Information Theory, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00563207

R. Gribonval, Should Penalized Least Squares Regression be Interpreted as Maximum A Posteriori Estimation?, IEEE Transactions on Signal Processing, vol.59, issue.5, pp.2405-2410, 2011.
DOI : 10.1109/TSP.2011.2107908

URL : https://hal.archives-ouvertes.fr/inria-00486840

Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems. Core discussion papers, Center for Operations Research and Econometrics (CORE), 2010.

C. Hsieh, K. Chang, C. Lin, S. S. Keerthi, and S. Sundararajan, A dual coordinate descent method for large-scale linear SVM, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.408-415, 2008.
DOI : 10.1145/1390156.1390208

P. Machart, T. Peel, L. Ralaivola, S. Anthoine, and H. Glotin, Stochastic low-rank kernel learning for regression, 28th International Conference on Machine Learning, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00657837

M. Raphan and E. P. Simoncelli, Learning to be bayesian without supervision, Adv. Neural Information Processing Systems (NIPS*06, 2007.

R. Gribonval and P. Machart, Reconciling " priors " & " priors " without prejudice? Research report RR-8366, 2013.