F. Bach, High-dimensional non-linear variable selection through hierarchical kernel learning, tech. report, 2009.

F. Bach, R. Jenatton, J. Mairal, and A. G. Obozinski, Convex optimization with sparsity-inducing norms, Optimization for Machine Learning, 2011.
DOI : 10.1561/2200000015

URL : https://hal.archives-ouvertes.fr/hal-00937150

R. G. Baraniuk, V. Cevher, M. F. Duarte, and A. C. , HEGDE, Model-based compressive sensing, IEEE Transactions on Information Theory, pp.56-1982, 2010.

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

E. J. Candes, M. B. Wakin, and A. S. Boyd, Enhancing Sparsity by Reweighted ??? 1 Minimization, Journal of Fourier Analysis and Applications, vol.7, issue.3, pp.877-905, 2008.
DOI : 10.1007/s00041-008-9045-x

M. K. Carroll, G. A. Cecchi, I. Rish, R. Garg, and A. A. Rao, Prediction and interpretation of distributed neural activity with sparse models, NeuroImage, vol.44, issue.1, pp.44-112, 2009.
DOI : 10.1016/j.neuroimage.2008.08.020

V. Cevher, M. F. Duarte, C. Hegde, and A. R. Baraniuk, Sparse signal recovery using markov random fields, Advances in Neural Information Processing Systems, 2008.

G. H. Chen and R. T. Rockafellar, Convergence Rates in Forward--Backward Splitting, SIAM Journal on Optimization, vol.7, issue.2, pp.421-444, 1997.
DOI : 10.1137/S1052623495290179

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.8438

S. S. Chen, D. L. Donoho, and A. M. Saunders, Atomic Decomposition by Basis Pursuit, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.33-61, 1998.
DOI : 10.1137/S1064827596304010

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.7694

D. B. Chklovskii and A. A. Koulakov, MAPS IN THE BRAIN: What Can We Learn from Them?, Annual Review of Neuroscience, vol.27, issue.1, pp.369-392, 2004.
DOI : 10.1146/annurev.neuro.27.070203.144226

P. L. Combettes and J. Pesquet, Proximal Splitting Methods in Signal Processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, 2010.
DOI : 10.1007/978-1-4419-9569-8_10

URL : https://hal.archives-ouvertes.fr/hal-00643807

P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling and Simulation, pp.1168-1200, 2006.
DOI : 10.1137/050626090

D. D. Cox-and-r and . Savoy, Functional magnetic resonance imaging (fMRI) " brain reading " : detecting and classifying distributed patterns of fMRI activity in human visual cortex, NeuroImage, pp.19-261, 2003.

S. Dehaene, G. Le-clec-'h, L. Cohen, J. Poline, P. Van-de et al., Inferring behavior from functional brain images, Nature Neuroscience, vol.388, issue.7, p.549, 1998.
DOI : 10.1038/2785

URL : https://hal.archives-ouvertes.fr/hal-00349936

D. L. Donoho and I. M. Johnstone, Adapting to Unknown Smoothness via Wavelet Shrinkage, Journal of the American Statistical Association, vol.31, issue.432, p.90, 1995.
DOI : 10.1080/01621459.1979.10481038

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.8697

M. F. Duarte and Y. C. Eldar, Structured compressed sensing: from theory to applications, tech. report, 2011.
DOI : 10.1109/tsp.2011.2161982

URL : http://arxiv.org/abs/1106.6224

E. Eger, C. Kell, and A. A. Kleinschmidt, Graded Size Sensitivity of Object-Exemplar-Evoked Activity Patterns Within Human LOC Subregions, Journal of Neurophysiology, vol.100, issue.4, pp.2038-2085, 2008.
DOI : 10.1152/jn.90305.2008

G. Flandin, F. Kherif, X. Pennec, G. Malandain, N. Ayache et al., Improved Detection Sensitivity in Functional MRI Data Using a Brain Parcelling Technique, Medical Image Computing and Computer-Assisted Intervention (MICCAI'02), pp.467-474, 2002.
DOI : 10.1007/3-540-45786-0_58

URL : https://hal.archives-ouvertes.fr/inria-00615921

K. J. Friston, A. P. Holmes, K. J. Worsley, J. B. Poline, C. Frith et al., Statistical parametric maps in functional imaging: A general linear approach, Human Brain Mapping, vol.26, issue.4, pp.189-210, 1995.
DOI : 10.1002/hbm.460020402

L. Grosenick, S. Greer, and A. B. Knutson, Interpretable Classifiers for fMRI Improve Prediction of Purchases, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol.16, issue.6, pp.539-548, 2009.
DOI : 10.1109/TNSRE.2008.926701

I. Guyon, J. Weston, S. Barnhill, and A. V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, pp.389-422, 2002.

T. Hastie, R. Tibshirani, and A. J. Friedman, The Elements of Statistical Learning: Data Mining, Inference , and Prediction, 2009.

J. Huang and T. Zhang, The benefit of group sparsity, The Annals of Statistics, vol.38, issue.4, pp.1978-2004, 2010.
DOI : 10.1214/09-AOS778

J. Huang, T. Zhang, and A. D. Metaxas, Learning with structured sparsity, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553429

URL : http://arxiv.org/abs/0903.3002

L. Jacob, G. Obozinski, and A. J. Vert, Group lasso with overlap and graph lasso, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553431

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.7108

R. Jenatton, J. Audibert, and A. F. Bach, Structured variable selection with sparsity-inducing norms, Journal of Machine Learning Research, vol.12, pp.2777-2824, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00377732

R. Jenatton, A. Gramfort, V. Michel, G. Obozinski, F. Bach et al., Multi-scale Mining of fMRI Data with Hierarchical Structured Sparsity, 2011 International Workshop on Pattern Recognition in NeuroImaging, 2011.
DOI : 10.1109/PRNI.2011.15

URL : https://hal.archives-ouvertes.fr/inria-00589785

R. Jenatton, J. Mairal, G. Obozinski, and A. F. Bach, Proximal methods for hierarchical sparse coding, Journal of Machine Learning Research, vol.12, pp.2297-2334, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00516723

S. C. Johnson, Hierarchical clustering schemes, Psychometrika, vol.58, issue.4, pp.32-241, 1967.
DOI : 10.1007/BF02289588

S. Kim and E. P. Xing, Tree-guided group Lasso for multi-task regression with structured sparsity, Proceedings of the International Conference on Machine Learning (ICML), 2010.
DOI : 10.1214/12-aoas549

URL : http://arxiv.org/abs/0909.1373

M. Kowalski, Sparse regression using mixed norms, Applied and Computational Harmonic Analysis, vol.27, issue.3, pp.303-324, 2009.
DOI : 10.1016/j.acha.2009.05.006

URL : https://hal.archives-ouvertes.fr/hal-00202904

S. Laconte, S. Strother, V. Cherkassky, J. Anderson, and A. X. Hu, Support vector machines for temporal classification of block design fMRI data, NeuroImage, vol.26, issue.2, pp.26-317, 2005.
DOI : 10.1016/j.neuroimage.2005.01.048

P. L. Lions and . Mercier, Splitting Algorithms for the Sum of Two Nonlinear Operators, SIAM Journal on Numerical Analysis, vol.16, issue.6, pp.964-979, 1979.
DOI : 10.1137/0716071

J. Liu, S. Ji, and A. J. Ye, Multi-task feature learning via efficient ? 2,1 -norm minimization, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), 2009.

J. Liu and J. Ye, Fast overlapping group lasso, tech. report, 2010.

A. Marquand, M. Howard, M. Brammer, C. Chu, S. Coen et al., Quantitative prediction of subjective pain intensity from whole-brain fMRI data using Gaussian processes, NeuroImage, vol.49, issue.3, pp.49-2178, 2010.
DOI : 10.1016/j.neuroimage.2009.10.072

B. Martinet, Régularisation d'inéquations variationnelles par approximations successives., Revue franaise d'informatique et de recherche opérationnelle, série rouge, 1970.
DOI : 10.1051/m2an/197004r301541

URL : http://archive.numdam.org/article/M2AN_1970__4_3_154_0.pdf

F. De-martino, G. Valente, N. Staeren, J. Ashburner, R. Goebel et al., Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns, NeuroImage, vol.43, issue.1, pp.43-44, 2008.
DOI : 10.1016/j.neuroimage.2008.06.037

A. F. Martins, N. A. Smith, P. M. Aguiar, and A. M. Figueiredo, Structured sparsity in structured prediction, Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011.

C. A. Micchelli, J. M. Morales, and A. M. Pontil, A family of penalty functions for structured sparsity, Advances in Neural Information Processing Systems, 2010.

V. Michel, E. Eger, C. Keribin, J. Poline, and A. B. Thirion, A supervised clustering approach for extracting predictive information from brain activation images, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Workshops, 2010.
DOI : 10.1109/CVPRW.2010.5543435

URL : https://hal.archives-ouvertes.fr/hal-00504094

V. Michel, A. Gramfort, G. Varoquaux, E. Eger, and A. B. Thirion, Total variation regularization for fMRI-based prediction of behaviour, Medical Imaging, IEEE Transactions on, p.1, 2011.

J. J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, C. R. Acad. Sci. Paris Sér. A Math, vol.255, pp.2897-2899, 1962.

F. Murtagh, A Survey of Algorithms for Contiguity-constrained Clustering and Related Problems, The Computer Journal, vol.28, issue.1, pp.82-88, 1985.
DOI : 10.1093/comjnl/28.1.82

Y. Nesterov, Gradient methods for minimizing composite objective function, tech. report, Center for Operations Research and Econometrics (CORE), 2007.
DOI : 10.1007/s10107-012-0629-5

G. Obozinski, B. Taskar, and A. M. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statistics and Computing, vol.8, issue.68, pp.1-22, 2009.
DOI : 10.1007/s11222-008-9111-x

P. Ramachandran, G. Varoquaux, and M. , Mayavi: 3D Visualization of Scientific Data, Computing in Science & Engineering, vol.13, issue.2, pp.40-51, 2011.
DOI : 10.1109/MCSE.2011.35

URL : https://hal.archives-ouvertes.fr/inria-00528985

R. A. Rifkin and . Klautau, In defense of one-vs-all classification, Journal of Machine Learning Research, vol.5, pp.101-141, 2004.

J. Rissman, H. T. Greely, and A. A. Wagner, Detecting individual memories through the neural decoding of memory states and past experience, Proceedings of the National Academy of Sciences, pp.9849-9854, 2010.
DOI : 10.1073/pnas.1001028107

S. Ryali, K. Supekar, D. A. Abrams, and A. V. Menon, Sparse logistic regression for whole-brain classification of fMRI data, NeuroImage, vol.51, issue.2, pp.51-752, 2010.
DOI : 10.1016/j.neuroimage.2010.02.040

M. Schmidt and K. Murphy, Convex structure learning in log-linear models: Beyond pairwise potentials, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.

B. Sch¨olkopfsch¨, . A. Sch¨olkopf, and . Smola, Learning with kernels: support vector machines, regularization, optimization , and beyond, 2002.

P. Sprechmann, I. Ramirez, P. Cancela, and A. G. Sapiro, Collaborative sources identification in mixed signals via hierarchical sparse modeling, tech. report, 2010.
DOI : 10.1109/icassp.2011.5947683

URL : http://arxiv.org/abs/1010.4893

M. Stojnic, F. Parvaresh, and A. B. Hassibi, On the Reconstruction of Block-Sparse Signals With an Optimal Number of Measurements, IEEE Transactions on Signal Processing, vol.57, issue.8, pp.3075-3085, 2009.
DOI : 10.1109/TSP.2009.2020754

B. Thirion, G. Flandin, P. Pinel, A. Roche, P. Ciuciu et al., Dealing with the shortcomings of spatial normalization: Multi-subject parcellation of fMRI datasets, Human Brain Mapping, vol.22, issue.8, pp.27-678, 2006.
DOI : 10.1002/hbm.20210

R. Tibshirani, Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society. Series B, pp.267-288, 1996.
DOI : 10.1111/j.1467-9868.2011.00771.x

P. Tseng, Applications of a Splitting Algorithm to Decomposition in Convex Programming and Variational Inequalities, SIAM Journal on Control and Optimization, vol.29, issue.1, p.119, 1991.
DOI : 10.1137/0329006

B. A. Turlach, W. N. Venables, and A. S. Wright, Simultaneous Variable Selection, Technometrics, vol.47, issue.3, pp.47-349, 2005.
DOI : 10.1198/004017005000000139

K. Ugurbil, L. Toth, and A. D. Kim, How accurate is magnetic resonance imaging of brain function?, Trends in Neurosciences, vol.26, issue.2, pp.108-114, 2003.
DOI : 10.1016/S0166-2236(02)00039-5

J. H. Ward, Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol.58, issue.301, pp.236-244, 1963.
DOI : 10.1007/BF02289263

S. J. Wright, R. D. Nowak, and A. M. Figueiredo, Sparse Reconstruction by Separable Approximation, IEEE Transactions on Signal Processing, vol.57, issue.7, pp.2479-2493, 2009.
DOI : 10.1109/TSP.2009.2016892

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.9334

O. Yamashita, M. Sato, T. Yoshioka, F. Tong, and A. Y. Kamitani, Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns, NeuroImage, vol.42, issue.4, pp.42-1414, 2008.
DOI : 10.1016/j.neuroimage.2008.05.050

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006.
DOI : 10.1198/016214502753479356

P. Zhao, G. Rocha, and A. B. Yu, The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, vol.37, issue.6A, pp.3468-3497, 2009.
DOI : 10.1214/07-AOS584

URL : http://arxiv.org/abs/0909.0411

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.5, issue.2, pp.301-320, 2005.
DOI : 10.1073/pnas.201162998