A. Andoni, R. Panigrahy, G. Valiant, and L. Zhang, Learning polynomials with neural networks, Proceedings of the International Conference on Machine Learning, 2014.

S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra et al., Vqa: Visual question answering, Proceedings of the IEEE International Conference on Computer Vision, pp.2425-2433, 2015.

A. Argyriou, R. Foygel, and N. Srebro, Sparse prediction with the k-support norm, Advances in Neural Information Processing Systems, vol.25, pp.1457-1465, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00858954

A. Argyriou, R. Foygel, and N. Srebro, Sparse prediction with the k-support norm, Advances Neural Information Processing Systems, pp.1466-1474, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00858954

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, vol.4, pp.1-106, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00613125

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Structured sparsity through convex optimization, Statistical Science, vol.27, issue.4, pp.450-468, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00621245

A. Backus, O. Jensen, E. Meeuwissen, M. Van-gerven, and S. Dumoulin, Investigating the temporal dynamics of long term memory representation retrieval using multivariate pattern analyses on magnetoencephalography data, 2011.

A. K. Balan, V. Rathod, K. Murphy, and M. Welling, Bayesian dark knowledge, Advances in Neural Information Processing Systems, 2015.

L. Baldassarre, J. Morales, A. Argyriou, and M. Pontil, A general framework for structured sparsity via proximal optimization, Proceedings of the International Conference on Artificial Intelligence and Statistics, pp.82-90, 2012.

L. Baldassarre, J. Mourao-miranda, and M. Pontil, Structured sparsity models for brain decoding from fMRI data, Proceedings of The International Workshop on Pattern Recognition in Neuroimaging, 2012.

S. Balmand and A. S. Dalalyan, On estimation of the diagonal elements of a sparse precision matrix, Electronic Journal of Statistics, vol.10, issue.1, pp.1551-1579, 2016.

A. Bartels and S. Zeki, The chronoarchitecture of the human brain-natural viewing conditions reveal a time-based anatomy of the brain, NeuroImage, vol.22, issue.1, pp.419-433, 2004.

A. Bartels and S. Zeki, Brain dynamics during natural viewing conditions-a new guide for mapping connectivity in vivo, NeuroImage, vol.24, issue.2, pp.339-349, 2005.

A. Bartels, S. Zeki, and N. Logothetis, Natural vision reveals regional specialization to local motion and to contrast-invariant, global flow in the human brain, Cerebral Cortex, vol.18, issue.3, pp.705-717, 2008.

H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, CMS Books in mathematics, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643354

A. Beck and M. Teboulle, Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. Image Processing, IEEE Transactions on, vol.18, issue.11, pp.2419-2434, 2009.

E. Belilovsky, A. Argyriou, G. Varoquaux, and M. Blaschko, Convex relaxations of penalties for sparse correlated variables with bounded total variation, Machine Learning, vol.100, pp.533-553, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01167861

E. Belilovsky, K. Gkirtzou, M. Misyrlis, A. B. Konova, J. Honorio et al., Predictive sparse modeling of fmri data for improved classification, regression, and visualization using the k-support norm, Computerized Medical Imaging and Graphics, vol.46, pp.40-46, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01141082

E. Belilovsky, W. Bounliphone, M. B. Blaschko, I. Antonoglou, and A. Gretton, A test of relative similarity for model selection in generative models, International Conference on Representation Learning, 2016.

E. Belilovsky, G. Varoquaux, and M. B. Blaschko, Hypothesis testing for differences in Gaussian graphical models: Applications to brain connectivity, Advances Neural Information Processing Systems, 2016.

E. Belilovsky, G. Varoquaux, and M. B. Blaschko, Testing for differences in gaussian graphical models: Applications to brain connectivity, Advances in Neural Information Processing Systems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01248844

E. Belilovsky, M. Blaschko, J. R. Kiros, R. Urtasun, and R. Zemel, Joint embeddings of scene graphs and images, International Conference on Representation Learning (ICLR) Workshop Track, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01667777

E. Belilovsky, K. Kastner, G. Varoquaux, and M. Blaschko, Learning to discover graphical model structures, International Conference on Machine Learning, 2017.

I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, Neural combinatorial optimization with reinforcement learning, International Conference on Representation Learning Workshop Track, 2017.

Y. Bengio, I. J. Goodfellow, and A. Courville, Deep learning, 2015. R. Bhatia. Matrix Analysis. Graduate Texts in Mathematics, vol.521, pp.436-444, 1997.
URL : https://hal.archives-ouvertes.fr/hal-01820431

M. Blaschko, J. Shelton, A. Bartels, C. Lampert, and A. Gretton, Semi-supervised kernel canonical correlation analysis with application to human fMRI, Pattern Recognition Letters, vol.32, issue.11, pp.167-8655, 2011.

M. B. Blaschko, A note on k-support norm regularized risk minimization, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00804592

M. B. Blaschko, J. A. Shelton, and A. Bartels, Augmenting feature-driven fMRI analyses: Semi-supervised learning and resting state activity, Advances in Neural Information Processing Systems, vol.22, pp.126-134, 2009.

K. M. Borgwardt, Graph kernels, 2007.

D. Borsook, E. A. Moulton, K. F. Schmidt, and L. R. Becerra, Neuroimaging revolutionizes therapeutic approaches to chronic pain, Molecular Pain, vol.3, issue.1, p.25, 2007.

P. Bühlmann and S. Van-de-geer, Statistics for High-Dimensional Data, 2011.

P. Bühlmann, M. Kalisch, and L. Meier, High-dimensional statistics with a view toward applications in biology, Annual Review of Statistics and Its Application, vol.1, pp.255-278, 2014.

K. Button, Power failure: Why small sample size undermines the reliability of neuroscience, Nature Reviews Neuroscience, vol.14, p.365, 2013.

T. Cai, W. Liu, and X. Luo, A constrained 1 minimization approach to sparse precision matrix estimation, Journal of the American Statistical Association, vol.106, issue.494, pp.594-607, 2011.

F. X. Castellanos, Clinical applications of the functional connectome, Neuroimage, vol.80, p.527, 2013.

H. W. Chase, S. B. Eickhoff, A. R. Laird, and L. Hogarth, The neural basis of drug stimulus processing and craving: An activation likelihood estimation meta-analysis, Biological Psychiatry, vol.70, issue.8, pp.785-793, 2011.

S. Chatterjee, S. Chen, and A. Banerjee, Generalized Dantzig selector: Application to the k-support norm, Advances in Neural Information Processing Systems, pp.1934-1942, 2014.

S. Chen, Y. Xing, and J. Kang, Latent and abnormal functional connectivity circuits in autism spectrum disorder, Frontiers in neuroscience, vol.11, 2017.

X. Chen, Q. Lin, S. Kim, J. Carbonell, and E. P. Xing, Smoothing proximal gradient method for general structured sparse learning, Conference on Uncertainty in Artificial Intelligence, 2011.

N. Cohen, O. Sharir, and A. Shashua, On the expressive power of deep learning: a tensor analysis, Conference On Learning Theory, 2016.

C. S. Culbertson, J. Bramen, M. S. Cohen, E. D. London, R. E. Olmstead et al., Effect of bupropion treatment on brain activation induced by cigarette-related cues in smokers, Archives of General Psychiatry, vol.68, issue.5, pp.505-515, 2011.

B. Da-mota, V. Fritsch, G. Varoquaux, T. Banaschewski, G. J. Barker et al., Randomized parcellation based inference, NeuroImage, vol.89, pp.203-215, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00915243

P. Danaher, P. Wang, and D. Witten, The joint graphical lasso for inverse covariance estimation across multiple classes, Journal of the Royal Statistical Society (B), vol.76, issue.2, pp.373-397, 2014.

A. P. Dawid, Conditional independence in statistical theory. Royal Statistical Society, 1979. A. P. Dempster. Covariance selection, Biometrics, pp.157-175, 1972.

E. L. Denton, W. Zaremba, J. Bruna, Y. Lecun, and R. Fergus, Exploiting linear structure within convolutional networks for efficient evaluation, Advances in Neural Information Processing Systems, 2014.

R. Dezeure, P. Bühlmann, L. Meier, and N. Meinshausen, High-dimensional inference: Confidence intervals, p-values and R-software hdi, Statistical Science, vol.30, issue.4, p.2015

A. D. Martino, The autism brain imaging data exchange: Towards a large-scale evaluation of the intrinsic brain architecture in autism, Molecular Psychiatry, vol.19, p.659, 2014.

E. Dohmatob, A. Gramfort, B. Thirion, and G. Varoquaux, Benchmarking solvers for TV-l1 least-squares and logistic regression in brain imaging, Proceedings of The International Workshop on Pattern Recognition in Neuroimaging, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00991743

M. Dubois, F. Hadj-selem, T. Lofstedt, M. Perrot, C. Fischer et al., Predictive support recovery with TV-elastic net penalty and logistic regression: An application to structural MRI, Proceedings of The International Workshop on Pattern Recognition in Neuroimaging, 2014.
URL : https://hal.archives-ouvertes.fr/cea-01016145

K. Duvenaud, Convolutional networks on graphs for learning molecular fingerprints, Advances in Neural Information Processing Systems, 2015.

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression. The Annals of statistics, vol.32, pp.407-499, 2004.

M. Fisher, M. Savva, and P. Hanrahan, Characterizing structural relationships in scenes using graph kernels, In ACM Transactions on Graphics, vol.30, p.34, 2011.

R. A. Fisher, The distribution of the partial correlation coefficient, Metron, vol.3, pp.329-332, 1924.

T. R. Franklin, Z. Wang, Y. Li, J. J. Suh, M. Goldman et al., Dopamine transporter genotype modulation of neural responses to smoking cues: Confirmation in a new cohort, Addiction Biology, vol.16, issue.2, pp.308-322, 2011.

J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, vol.9, issue.3, pp.432-441, 2008.

J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, Journal of statistical software, vol.33, issue.1, p.1, 2010.

A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean et al., Devise: A deep visual-semantic embedding model, Advances in Neural Information Processing Systems, pp.2121-2129, 2013.

A. Ganguly and W. Polonik, Local neighborhood fusion in locally constant Gaussian graphical models, 2014.

J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, Neural message passing for quantum chemistry, International Conference of Machine Learning, 2017.

K. Gkirtzou, J. Honorio, D. Samaras, R. Goldstein, and M. B. Blaschko, fMRI analysis of cocaine addiction using k-support sparsity, IEEE 10th International Symposium on, pp.1078-1081, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00784386

K. Gkirtzou, J. Honorio, D. Samaras, R. Z. Goldstein, and M. B. Blaschko, fMRI analysis of cocaine addiction using k-support sparsity, IEEE International Symposium on Biomedical Imaging, pp.1078-1081, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00784386

R. Goldstein and N. Volkow, Drug addiction and its underlying neurobiological basis: Neuroimaging evidence for the involvement of the frontal cortex, The American Journal of Psychiatry, vol.159, issue.10, p.1642, 2002.

R. Goldstein, D. Tomasi, S. Rajaram, L. Cottone, L. Zhang et al., Role of the anterior cingulate and medial orbitofrontal cortex in processing drug cues in cocaine addiction, Neuroscience, vol.144, issue.4, pp.1153-1159, 2007.

R. Goldstein, N. Alia-klein, D. Tomasi, J. Carrillo, T. Maloney et al., Anterior cingulate cortex hypoactivations to an emotionally salient task in cocaine addiction, vol.106, p.9453, 2009.

R. Z. Goldstein, P. A. Woicik, T. Maloney, D. Tomasi, N. Alia-klein et al., Oral methylphenidate normalizes cingulate activity in cocaine addiction during a salient cognitive task, Proceedings of the National Academy of Sciences, vol.107, pp.16667-16672, 2010.

E. Gómez, A multivariate generalization of the power exponential family of distributions, Communications in Statistics-Theory, vol.27, issue.3, 1998.

A. R. Goncalves, P. Das, S. Chatterjee, V. Sivakumar, F. J. Von-zuben et al., Multi-task sparse structure learning, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM '14, pp.451-460, 2014.

A. Gramfort, B. Thirion, and G. Varoquaux, Identifying predictive regions from fMRI with TV-L1 prior, Proceedings of The International Workshop on Pattern Recognition in Neuroimaging, pp.17-20, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00839984

A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, 2012.

K. Gregor and Y. Lecun, Learning fast approximations of sparse coding, Proceedings of the International Conference on Machine Learning, 2010.

M. G. Sell, J. Taylor, and R. Tibshirani, Adaptive testing for the graphical lasso, 2013.

H. Hara and A. Takemura, A localization approach to improve iterative proportional scaling in Gaussian graphical models, Commun Stat Theory Methods, vol.39, issue.8-9, pp.1643-1654, 2010.

D. R. Hardoon, J. Miranda, M. Brammer, and J. Shawe-taylor, Unsupervised analysis of fMRI data using kernel canonical correlation, NeuroImage, vol.37, issue.4, pp.1250-1259, 2007.

J. Haxby, M. Gobbini, M. Furey, A. Ishai, J. Schouten et al., Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex, Science, vol.293, issue.5539, pp.2425-2430, 2001.

M. Hebiri, S. Van-de, and . Geer, The smooth-lasso and other 1+ 2-penalized methods, Electronic Journal of Statistics, vol.5, pp.1184-1226, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00462882

M. Henaff, J. Bruna, and Y. Lecun, Deep convolutional networks on graph-structured data, 2015.

C. Honey, O. Sporns, L. Cammoun, and X. Gigandet, Predicting human resting-state functional connectivity from structural connectivity, Proceedings of National Academy of Sciences, vol.106, p.2035, 2009.

J. Honorio and D. Samaras, Multi-task learning of Gaussian graphical models, Proceedings of the International Conference on Machine Learning, 2010.

J. Honorio, D. Samaras, N. Paragios, R. Goldstein, and L. E. Ortiz, Sparse and locally constant Gaussian graphical models, Advances in Neural Information Processing Systems, pp.745-753, 2009.

J. Honorio, T. Jaakkola, and D. Samaras, On the statistical efficiency of 1,p multi-task learning of Gaussian graphical models, 2012.

J. Honorio, D. Tomasi, R. Z. Goldstein, H. Leung, and D. Samaras, Can a single brain region predict a disorder?, IEEE Transactions on Medical Imaging, vol.31, issue.11, pp.2062-2072, 2012.

J. J. Hopfield and D. W. Tank, neural" computation of decisions in optimization problems, Biological cybernetics, vol.52, issue.3, pp.141-152, 1985.

C. Hsieh, M. A. Sustik, I. S. Dhillon, P. K. Ravikumar, and R. Poldrack, Big & quic: Sparse inverse covariance estimation for a million variables, Advances in neural information processing systems, pp.3165-3173, 2013.

J. Huang, T. Zhang, and D. Metaxas, Learning with structured sparsity, Proceedings of the International Conference on Machine Learning, pp.417-424, 2009.

J. Huang, S. Zhang, H. Li, and D. Metaxas, Composite splitting algorithms for convex optimization, Computer Vision and Image Understanding, vol.115, issue.12, pp.1610-1622, 2011.

J. Huang, S. Zhang, and D. Metaxas, Efficient mr image reconstruction for compressed mr imaging, Medical Image Analysis, vol.15, issue.5, pp.670-679, 2011.

L. Jacob, G. Obozinski, and J. Vert, Group lasso with overlap and graph lasso, Proceedings of the 26th annual international conference on machine learning, pp.433-440, 2009.

J. Janková and S. Van-de-geer, Confidence intervals for high-dimensional inverse covariance estimation, Electron. J. Statist, vol.9, issue.1, pp.1205-1229, 2015.

A. Javanmard and A. Montanari, Confidence intervals and hypothesis testing for highdimensional regression, The Journal of Machine Learning Research, vol.15, issue.1, pp.2869-2909, 2014.

R. Jenatton, A. Gramfort, V. Michel, G. Obozinski, E. Eger et al., Multiscale mining of fmri data with hierarchical structured sparsity, SIAM Journal on Imaging Sciences, vol.5, issue.3, pp.835-856, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00589785

J. Johnson, R. Krishna, M. Stark, L. Li, D. A. Shamma et al., Image retrieval using scene graphs, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3668-3678, 2015.

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, Bag of tricks for efficient text classification, Proceedings of the 13th Conference of the European Chapter, 2016.

C. Kelly, B. B. Biswal, R. C. Craddock, F. X. Castellanos, and M. P. Milham, Characterizing variation in the functional connectome: Promise and pitfalls, Trends in Cognitive Science, vol.16, p.181, 2012.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference for Learning Representations, 2015.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, 2015.

R. Kiros, R. Salakhutdinov, and R. S. Zemel, Unifying visual-semantic embeddings with multimodal neural language models, Neural Information Processing Deep Learning Workshop, 2014.

D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques, 2009.

R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata et al., Visual genome: Connecting language and vision using crowdsourced dense image annotations, International Journal of Computer Vision, vol.123, issue.1, pp.32-73, 2017.

A. Kyrillidis, G. Puy, and V. Cevher, Hard thresholding with norm constraints, Acoustics, Speech and Signal Processing, pp.3645-3648, 2012.

S. L. Lauritzen, Graphical models, LeCun and C. Cortes. MNIST handwritten digit database, 1996.

O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance matrices, Journal of multivariate analysis, vol.88, issue.2, pp.365-411, 2004.

A. Lenkoski, A direct sampler for G-Wishart variates, Statistics, vol.2, issue.1, pp.119-128, 2013.

G. Letac and H. Massam, Wishart distributions for decomposable graphs, The Annals of Statistics, vol.35, issue.3, pp.1278-1323, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00635771

Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, Gated graph sequence neural networks, International Conference on Learning Representations, 2016.

M. A. Lindquist, The statistical analysis of fMRI data, Stat. Sci, vol.23, issue.4, pp.439-464, 2008.

X. Liu, J. Hairston, M. Schrier, and J. Fan, Common and distinct networks underlying reward valence and processing stages: A meta-analysis of functional neuroimaging studies, Neuroscience & Biobehavioral Reviews, vol.35, issue.5, pp.1219-1236, 2011.

R. Lockhart, J. Taylor, R. J. Tibshirani, and R. Tibshirani, A significance test for the lasso, Ann. Stat, vol.42, p.413, 2014.

P. Loh and M. J. Wainwright, Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses, Ann. Stat, vol.41, issue.6, pp.3022-3049, 2013.

D. Lopez-paz, K. Muandet, B. Schölkopf, and I. Tolstikhin, Towards a learning theory of cause-effect inference, Proceedings of the International Conference on Machine Learning, 2015.

C. Lu, R. Krishna, M. Bernstein, and L. Fei-fei, Visual relationship detection with language priors, European Conference on Computer Vision, pp.852-869, 2016.

W. Luo, Y. Li, R. Urtasun, and R. Zemel, Understanding the effective receptive field in deep convolutional neural networks, Proceedings of the International Conference on Machine Learning, 2010.

J. Mairal and B. Yu, Supervised feature selection in graphs with path coding penalties and network flows, Journal of Machine Learning, vol.14, issue.1, pp.2449-2485, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00806372

N. T. Markov, M. Ercsey-ravasz, D. C. Van-essen, K. Knoblauch, Z. Toroczkai et al., Cortical high-density counterstream architectures, Science, vol.342, issue.6158, p.1238406, 2013.
URL : https://hal.archives-ouvertes.fr/inserm-00879494

G. Marsaglia, Conditional means and covariances of normal variables with singular covariance matrix, Journal of the American Statistical Association, vol.59, issue.308, pp.1203-1204, 1964.

A. M. Mcdonald, M. Pontil, and D. Stamos, New perspectives on k-support and cluster norms, 2014.

N. Meinshausen and P. Bühlmann, High-dimensional graphs and variable selection with the lasso, Ann. Stat, pp.1436-1462, 2006.

N. Meinshausen and P. Bühlmann, Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.72, issue.4, pp.417-473, 2010.

V. Michel, A. Gramfort, G. Varoquaux, E. Eger, and B. Thirion, Total variation regularization for fMRI-based prediction of behavior, IEEE Trans. Med. Imaging, vol.30, issue.7, pp.1328-1340, 2011.

M. Misyrlis, A. Konova, M. Blaschko, J. Honorio, N. Alia-klein et al., Predicting cross-task behavioral variables from fMRI data using the k-support norm, Sparsity Techniques in Medical Imaging, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01026303

M. Misyrlis, A. B. Konova, M. B. Blaschko, J. Honorio, N. Alia-klein et al., Predicting cross-task behavioral variables from fMRI data using the k-support norm, Sparsity Techniques in Medical Imaging, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01026303

S. J. Moeller, D. Tomasi, J. Honorio, N. D. Volkow, and R. Z. Goldstein, Dopaminergic involvement during mental fatigue in health and cocaine addiction, Translational Psychiatry, vol.2, issue.10, p.176, 2012.

B. Moghaddam, E. Khan, K. P. Murphy, and B. M. Marlin, Accelerating Bayesian structural inference for non-decomposable Gaussian graphical models, Neural Information Processing Systems, 2009.

A. Mohammadi and E. C. Wit, Bayesian structure learning in sparse Gaussian graphical models, Bayesian Analysis, vol.10, issue.1, pp.109-138, 2015.

K. Mohan, M. Chung, S. Han, D. Witten, S. Lee et al., Structured learning of Gaussian graphical models, Advances in Neural Information Processing Systems, pp.620-628, 2012.

T. Moreau and J. Bruna, Understanding trainable sparse coding via matrix factorization, International Conference on Learning Representations, 2017.

K. P. Murphy, Machine learning: a probabilistic perspective, 2012.

M. Narayan and G. I. Allen, Mixed effects models to find differences in multi-subject functional connectivity, p.27516, 2015.

Y. Nesterov, Introductory lectures on convex optimization, 2004.

Y. Nesterov, Excessive gap technique in nonsmooth convex minimization, SIAM Journal on Optimization, vol.16, issue.1, pp.235-249, 2005.

T. E. Nichols and A. P. Holmes, Nonparametric permutation tests for functional neuroimaging: A primer with examples, Human Brain Mapping, vol.15, issue.1, pp.1-25, 2002.

E. Oyallon, E. Belilovsky, and S. Zagoruyko, Scaling the scattering transform, Deep hybrid networks. International Conference on Computer Vision, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01495734

E. Oyallon, S. Zagoruyko, G. Huang, N. Komodakis, S. Lacoste-julien et al., Scattering networks for hybrid representation learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.41, issue.9, pp.2208-2221, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01837587

N. Parikh and S. Boyd, Foundations and trends in optimization, Foundations and Trends in Theoretical Computer Science, vol.8, issue.1-2, 2014.

F. Pedregosa, Scikit-learn: Machine learning in python, Journal of Machine Learning, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

C. Peeters, A. Bilgrau, and W. Van-wieringen, rags2ridges: Ridge estimation of precision matrices from high-dimensional data. R package, 2015.

R. A. Poldrack, J. A. Mumford, and T. E. Nichols, Handbook of functional MRI data analysis, 2011.

P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu, High-dimensional covariance estimation by minimizing 1 -penalized log-determinant divergence, EJS, vol.5, pp.935-980, 2011.

J. Richiardi, H. Eryilmaz, S. Schwartz, P. Vuilleumier, D. Van-de et al., Decoding brain states from fMRI connectivity graphs, NeuroImage, vol.56, pp.616-626, 2011.

P. Rigollet, 18. s997: High dimensional statistics, Lecture Notes, 2015.

A. Roverato, Hyper inverse wishart distribution for non-decomposable graphs and its application to bayesian inference for gaussian graphical models, Scandinavian Journal of Statistics, vol.29, issue.3, pp.391-411, 2002.

L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys. D, vol.60, issue.1-4, pp.259-268, 1992.

S. Ryali, T. Chen, K. Supekar, and V. Menon, Estimation of functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty, NeuroImage, vol.59, issue.4, pp.3852-3861, 2012.

J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks, vol.61, pp.85-117, 2015.

W. R. Shirer, S. Ryali, E. Rykhlevskaia, V. Menon, and M. D. Greicius, Decoding subjectdriven cognitive states with whole-brain connectivity patterns, Cerebral Cortex, vol.22, issue.1, pp.158-165, 2012.

K. A. Smith, Neural networks for combinatorial optimization: a review of more than a decade of research, INFORMS Journal on Computing, vol.11, issue.1, pp.15-34, 1999.

S. M. Smith, K. L. Miller, G. Salimi-khorshidi, M. Webster, C. F. Beckmann et al., Network modelling methods for fMRI, NeuroImage, vol.54, p.875, 2011.

S. Song, Z. Zhan, Z. Long, J. Zhang, and L. Yao, Comparative study of SVM methods combined with voxel selection for object category classification on fMRI data, PLoS One, vol.6, issue.2, p.17191, 2011.

S. J. Swamidass, J. Chen, J. Bruand, P. Phung, L. Ralaivola et al., Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity, Bioinformatics, vol.21, issue.1, pp.359-368, 2005.

D. Teney, L. Liu, and A. V. Hengel, Graph-structured representations for visual question answering. Computer Vision and Pattern Recognition, 2016.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B, vol.58, pp.267-288, 1996.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B, pp.267-288, 1996.

R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, Sparsity and smoothness via the fused lasso, Journal of the Royal Statistical Society Series B, 2005.

S. Van-de-geer, P. Bühlmann, Y. Ritov, and R. Dezeure, On asymptotically optimal confidence regions and tests for high-dimensional models, The Annals of Statistics, vol.42, issue.3, pp.1166-1202, 2014.

G. Varoquaux and R. C. Craddock, Learning and comparing functional connectomes across subjects, NeuroImage, vol.80, pp.405-415, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00812911

G. Varoquaux, A. Gramfort, J. Poline, and B. Thirion, Brain covariance selection: Better individual functional connectivity models using population prior, Advances in Neural Information Processing Systems, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00512451

G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, and B. Thirion, Multi-subject dictionary learning to segment an atlas of brain spontaneous activity, Information Processing in Medical Imaging, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00588898

V. Vazirani, Approximation Algorithms, 2001.

I. Vendrov, R. Kiros, S. Fidler, and R. Urtasun, Order-embeddings of images and language, International Conference on Representation Learning, 2016.

O. Vinyals, M. Fortunato, and N. Jaitly, Pointer networks, Neural Information Processing Systems, 2015.

O. Vinyals, Order matters: Sequence to sequence for sets, International Conference on Learning Representations, 2016.

L. Waldorp, Testing for graph differences using the desparsified lasso in high-dimensional data, Statistics Survey, 2014.

H. Wang and S. Z. Li, Efficient gaussian graphical model determination under g-wishart prior distributions, Electronic Journal of Statistics, vol.6, pp.168-198, 2012.

W. Wang, M. J. Wainwright, and K. Ramchandran, Information-theoretic bounds on model selection for gaussian markov random fields, IEEE International Symposium on Information Theory, pp.1373-1377, 2010.

Y. Wang, L. Wang, Y. Li, D. He, T. Liu et al., A theoretical analysis of normalized discounted cumulative gain (NDCG) ranking measures, Conference on Learning Theory, 2013.

D. J. Watts and S. H. Strogatz, Collective dynamics of 'small-world' networks, Nature, vol.393, issue.6684, pp.440-442, 1998.

J. Whittaker, Graphical Models in Applied Multivariate Statistics, 2009.

B. Xin, Y. Wang, W. Gao, and D. Wipf, Maximal sparsity with deep networks?, Advances in Neural Information Processing Systems, 2016.

S. Yan, X. Yang, C. Wu, Z. Zheng, and Y. Guo, Balancing the stability and predictive performance for multivariate voxel selection in fMRI study, Brain Informatics and Health, pp.90-99, 2014.

F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions. International Conference on Learning Representations, 2016.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, pp.49-67, 2006.

W. Zaremba, M. P. Kumar, A. Gramfort, and M. B. Blaschko, Learning from M/EEG data with variable brain activation delays, Information Processing in Medical Imaging, pp.414-425, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00803981

S. D. Zhao, T. T. Cai, and H. Li, Direct estimation of differential networks, Biometrika, vol.101, issue.2, pp.253-268, 2014.

H. Zou and T. Hastie, Regularization and variable selection via the Elastic Net, Journal of the Royal Statistical Society, Series B, vol.67, pp.301-320, 2005.