A. Abraham, M. P. Milham, D. Martino, A. Craddock, R. C. Samaras et al., Deriving reproducible biomarkers from multi-site resting-state data: an Autismbased example, NeuroImage, vol.147, pp.736-745, 2017.
DOI : 10.1016/j.neuroimage.2016.10.045

URL : https://hal.archives-ouvertes.fr/hal-01398867

A. Abraham, F. Pedregosa, M. Eickenberg, P. Gervais, A. Mueller et al., Machine learning for neuroimaging with Scikit-Learn, vol.8, p.14, 2014.
DOI : 10.3389/fninf.2014.00014

URL : https://hal.archives-ouvertes.fr/hal-01093971

S. M. Aji and R. J. Mceliece, The generalized distributive law, IEEE Transactions on Information Theory, vol.46, issue.2, pp.325-343, 2000.
DOI : 10.1109/18.825794

URL : https://authors.library.caltech.edu/1541/1/AJIieeetit00.pdf

R. P. Alvarez, G. Jasdzewski, and R. A. Poldrack, Building memories in two languages: an fMRI study of episodic encoding in bilinguals, Society for Neuroscience Abstracts, 2002.

M. Amalric and S. Dehaene, Origins of the brain networks for advanced mathematics in expert mathematicians, Proceedings of the National Academy of Sciences, vol.113, issue.18, pp.4909-4917, 2016.

B. Amos and J. Z. Kolter, Optnet: differentiable optimization as a layer in neural networks, Proceedings of the International Conference on Machine Learning, pp.136-145, 2017.

R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeled data, Journal of Machine Learning Research, vol.6, pp.1817-1853, 2005.

A. R. Aron, T. E. Behrens, S. Smith, M. J. Frank, and R. Poldrack, Triangulating a cognitive control network using diffusionweighted magnetic resonance imaging (MRI) and functional MRI, The Journal of Neuroscience, vol.27, pp.3743-3752, 2007.
DOI : 10.1523/jneurosci.0519-07.2007

URL : http://www.jneurosci.org/content/jneuro/27/14/3743.full.pdf

A. R. Aron, M. Gluck, and R. A. Poldrack, Long-term testretest reliability of functional MRI in a classification learning task, NeuroImage, vol.29, pp.1000-1006, 2006.

D. Bahdanau, K. Cho, and Y. Bengio, Neural Machine Translation by jointly learning to align and translate, Proceedings of the International Conference on Learning Representation, 2015.

G. Bak?r, T. Hofmann, B. Schölkopf, A. J. Smola, B. Taskar et al., Predicting structured data, 2007.

C. Banderier and S. Schwer, Why Delannoy numbers?, Journal of Statistical Planning and Inference, vol.135, issue.1, pp.40-54, 2005.
DOI : 10.1016/j.jspi.2005.02.004

URL : https://hal.archives-ouvertes.fr/hal-00085552

D. M. Barch, G. C. Burgess, M. P. Harms, S. E. Petersen, B. L. Schlaggar et al., Function in the human connectome: task-fMRI and individual differences in behavior, NeuroImage, vol.80, pp.169-189, 2013.

L. F. Barrett, The future of psychology: connecting mind to brain, Perspectives on Psychological Science, vol.4, issue.4, pp.326-339, 2009.
DOI : 10.1111/j.1745-6924.2009.01134.x

URL : http://europepmc.org/articles/pmc2763392?pdf=render

L. E. Baum and T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains, The Annals of Mathematical Statistics, vol.37, issue.6, pp.1554-1563, 1966.
DOI : 10.1214/aoms/1177699147

URL : http://doi.org/10.1214/aoms/1177699147

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

URL : http://ie.technion.ac.il/%7Ebecka/papers/finalicassp2009.pdf

A. Beck and M. Teboulle, Smoothing and first order methods: a unified framework, SIAM Journal on Optimization, vol.22, issue.2, pp.557-580, 2012.
DOI : 10.1137/100818327

URL : http://ie.technion.ac.il/Home/Users/becka/smoothing.pdf

A. Beck and L. Tetruashvili, On the convergence of block coordinate descent type methods, SIAM Journal on Optimization, vol.23, issue.4, pp.2037-2060, 2013.

S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D. S. Seljebotn et al., Cython: the best of both worlds, Computing in Science & Engineering, vol.13, issue.2, pp.31-39, 2011.
DOI : 10.1109/mcse.2010.118

A. J. Bell and T. J. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Computation, vol.7, issue.6, pp.1129-1159, 1995.
DOI : 10.1162/neco.1995.7.6.1129

URL : http://bme.iust.ac.ir/courses/adsp/adsp_ref16.pdf

R. M. Bell and Y. Koren, Lessons from the Netflix prize challenge, ACM SIGKDD Explorations Newsletter, vol.9, issue.2, pp.75-79, 2007.

R. Bellman, On the theory of dynamic programming, Proceedings of the National Academy of Sciences, vol.38, pp.716-719, 1952.

D. P. Bertsekas, Control of uncertain systems with a set-membership description of the uncertainty. (Doctoral dissertation, 1971.

E. Bingham and H. Mannila, Random projection in dimensionality reduction: applications to image and text data, Proceedings of the SIGKDD Conference, pp.245-250, 2001.

C. M. Bishop, B. Biswal, F. Zerrin-yetkin, V. M. Haughton, and J. S. Hyde, Functional connectivity in the motor cortex of resting human brain using echo-planar MRI, Magnetic Resonance in Medicine, vol.34, issue.4, pp.537-541, 1995.

M. Blondel, V. Seguy, and A. Rolet, Smooth and sparse optimal transport, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018.

J. M. Borwein and A. S. Lewis, Convex analysis and nonlinear optimization: theory and examples, 2010.

L. Bottou, On-line learning and stochastic approximations, pp.9-42, 1999.

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT, pp.177-186, 2010.

L. Bottou, Y. Bengio, and Y. Lecun, Global training of document processing systems using graph transformer networks, Proceedings of the Conference on Computer Vision and Pattern Recognition, pp.489-494, 1997.

L. Bottou, F. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, vol.60, issue.2, pp.223-311, 2018.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

L. Breiman, Bagging predictors, Machine Learning, vol.24, pp.123-140, 1996.

S. Burer and R. D. Monteiro, Local minima and convergence in low-rank semidefinite programming, Mathematical Programming, vol.103, issue.3, pp.427-444, 2004.

K. S. Button, J. P. Ioannidis, C. Mokrysz, B. A. Nosek, J. Flint et al., Power failure: why small sample size undermines the reliability of neuroscience, Nature Reviews Neuroscience, vol.14, issue.5, pp.365-376, 2013.

V. Calhoun, T. Adali, G. Pearlson, and J. Pekar, A method for making group inferences from functional MRI data using independent component analysis, Human Brain Mapping, 2001.

E. J. Candès and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, vol.9, issue.6, pp.717-772, 2009.

E. J. Candès and T. Tao, Near-optimal signal recovery from random projections: Universal encoding strategies?, IEEE Transactions on Information Theory, vol.52, issue.12, pp.5406-5425, 2006.

O. Cappé and E. Moulines, Online EM algorithm for latent data models, Journal of the Royal Statistical Society: Series B, vol.71, issue.3, pp.593-613, 2009.

L. A. Cauchy, Méthode générale pour la résolution des systèmes d'équations simultanées, Compte Rendu à l'Académie des Sciences de, 1847.

E. Cauvet, Traitement des structures syntaxiques dans le langage et dans la musique (Doctoral dissertation, vol.6, 2012.

Y. Chen, N. M. Nasrabadi, and T. D. Tran, Hyperspectral image classification using dictionary-based sparse representation, IEEE Transactions on Geoscience and Remote Sensing, vol.49, issue.10, pp.3973-3985, 2011.

J. R. Cohen, The development and generality of self-control (Doctoral dissertation, 2009.

A. K. Collier, D. H. Wolf, J. N. Valdez, B. I. Turetsky, M. A. Elliott et al., Comparison of auditory and visual oddball fMRI in schizophrenia, Schizophrenia research, vol.158, pp.183-188, 2014.

M. Collins, Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms, Proceedings of ACL, pp.1-8, 2002.

R. Collobert and J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, Proceeding of the International Conference on Machine Learning, pp.160-167, 2008.

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., Natural language processing (almost) from scratch, The Journal of Machine Learning Research, vol.12, pp.2493-2537, 2011.

D. D. Cox and R. L. Savoy, Functional magnetic resonance imaging (fMRI) "brain reading": detecting and classifying distributed patterns of fMRI activity in human visual cortex, NeuroImage, vol.19, issue.2, pp.261-270, 2003.

M. Cuturi and M. Blondel, Soft-dtw: a differentiable loss function for time-series, Proceedings of the International Conference on Machine Learning, pp.894-903, 2017.

J. M. Danskin, The theory of max-min, with applications, SIAM Journal on Applied Mathematics, vol.14, issue.4, pp.641-664, 1966.

R. S. Desikan, F. Ségonne, B. Fischl, B. T. Quinn, B. C. Dickerson et al., An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest, NeuroImage, vol.31, issue.3, pp.968-980, 2006.

A. D. Devauchelle, C. Oppenheim, L. Rizzi, S. Dehaene, and C. Pallier, Sentence syntax and content in the human temporal lobe: an fMRI adaptation study in auditory and visual modalities, Journal of Cognitive Neuroscience, vol.21, issue.5, pp.1000-1012, 2009.

J. T. Devlin and R. A. Poldrack, praise of tedious anatomy, vol.37, pp.1050-1058, 2007.

J. Djolonga and A. Krause, Differentiable learning of submodular functions, Advances in Neural Information Processing Systems, pp.1014-1024, 2017.

E. Dohmatob, A. Mensch, G. Varoquaux, and B. Thirion, Learning brain regions via large-scale online structured sparse dictionary learning, Advances in Neural Information Processing Systems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01369134

J. L. Doob, Stochastic processes, 1990.

Z. Duan and B. Pardo, Soundprism: an online system for scoreinformed source separation of music audio, IEEE Journal of Selected Topics in Signal Processing, vol.5, issue.6, pp.1205-1215, 2011.

J. Duchi, S. Shalev-shwartz, Y. Singer, and T. Chandra, Efficient projections onto the ? 1-ball for learning in high dimensions, Proceedings of the International Conference on Machine Learning, pp.272-279, 2008.

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, vol.10, pp.2899-2934, 2009.

K. Duncan, C. Pattamadilok, I. Knierim, and J. Devlin, Consistency and variability in functional localisers, NeuroImage, vol.46, pp.1018-1026, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01728487

C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika, vol.1, issue.3, pp.211-218, 1936.

J. Eisner, Inside-outside and forward-backward algorithms are just backprop (tutorial paper), Proceedings of the Workshop on Structured Prediction for NLP, pp.1-17, 2016.

J. A. Etzel, V. Gazzola, and C. Keysers, An introduction to anatomical ROI-based fMRI classification analysis, Brain Research, vol.1282, pp.114-125, 2009.

A. Evans, D. Collins, S. Mills, E. Brown, R. Kelly et al., , pp.1813-1817, 1993.

M. Fazel, H. Hindi, and S. Boyd, A rank minimization heuristic with application to minimum order system approximation, vol.6, pp.4734-4739, 2001.

Z. Yu-feng, H. Yong, Z. Chao-zhe, C. Qing-jiu, S. Man-qiu et al., Altered baseline brain activity in children with ADHD revealed by resting-state functional MRI, Brain and Development, vol.29, issue.2, pp.83-91, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00123309

R. A. Fisher, The statistical utilization of multiple measurements, Annals of Human Genetics, vol.8, issue.4, pp.376-386, 1938.

K. Foerde, B. Knowlton, and R. A. Poldrack, Modulation of competing memory systems by distraction, Proceedings of the National Academy of Science, vol.103, pp.11778-11783, 2006.

M. D. Fox and M. E. Raichle, Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging, Nature Reviews Neuroscience, vol.8, issue.9, p.700, 2007.

J. Friedman, T. Hastie, H. Höfling, and R. Tibshirani, Pathwise coordinate optimization, The Annals of Applied Statistics, vol.1, issue.2, pp.302-332, 2007.

K. J. Friston, A. P. Holmes, K. J. Worsley, J. Poline, C. D. Frith et al., Statistical parametric maps in functional imaging: a general linear approach, Human brain mapping, vol.2, issue.4, pp.189-210, 1994.

D. Garreau, R. Lajugie, S. Arlot, and F. Bach, Metric learning for temporal sequence alignment, Advances in Neural Information Processing Systems, pp.1817-1825, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01062130

B. Gauthier, E. Eger, G. Hesselmann, A. Giraud, and A. Kleinschmidt, Temporal tuning properties along the human ventral visual stream, The Journal of Neuroscience, vol.32, pp.14433-14441, 2012.

K. J. Gorgolewski, A. Storkey, M. E. Bastin, I. R. Whittle, J. M. Wardlaw et al., A test-retest fMRI dataset for motor, language and spatial attention functions, vol.2, p.6, 2013.

K. Goyal, G. Neubig, C. Dyer, and T. Berg-kirkpatrick, A continuous relaxation of beam search for end-to-end training of neural sequence models, 2017.

M. D. Greicius, Resting-state functional connectivity in neuropsychiatric disorders, Current Opinion in Neurology, vol.21, issue.4, pp.424-430, 2008.

M. D. Greicius, B. Krasnow, A. L. Reiss, and V. Menon, Functional connectivity in the resting brain: a network analysis of the default mode hypothesis, Proceedings of the National Academy of Sciences, vol.100, issue.1, pp.253-258, 2003.

L. Grosenick, B. Klingenberg, K. Katovich, B. Knutson, and J. E. Taylor, Interpretable whole-brain prediction analysis with GraphNet, NeuroImage, vol.72, pp.304-321, 2013.

E. Gselmann, Entropy functions and functional equations, Mathematical Communications, issue.16, pp.347-357, 2011.

N. Halko, P. Martinsson, and J. A. Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, vol.53, issue.2, pp.217-288, 2011.

N. Hara, E. Cauvet, A. D. Devauchelle, S. Dehaene, and C. Pallier, Neural correlates of constituent structure in language and music, NeuroImage, vol.47, p.143, 2009.

A. Kelly, L. Q. Uddin, B. B. Biswal, F. Castellanos, and M. Milham, Competition between functional brain networks mediates behavioral variability, NeuroImage, p.527, 2008.
DOI : 10.1016/j.neuroimage.2007.08.008

H. Kim and H. Park, Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis, Bioinformatics, vol.23, issue.12, pp.1495-1502, 2007.
DOI : 10.1093/bioinformatics/btm134

URL : https://academic.oup.com/bioinformatics/article-pdf/23/12/1495/605053/btm134.pdf

Y. Kim, C. Denton, L. Hoang, and A. M. Rush, Structured attention networks, Proceedings of the International Conference on Learning Representation, 2017.

D. P. Kingma and J. Ba, Adam: a method for stochastic optimization, International Conference for Learning Representations, 2015.

D. P. Kingma, T. Salimans, and M. Welling, Variational Dropout and the local reparameterization trick, Advances in Neural Information Processing Systems, pp.2575-2583, 2015.

A. Knops, B. Thirion, E. M. Hubbard, V. Michel, and S. Dehaene, Recruitment of an area involved in eye movements during mental arithmetic, Science, vol.324, pp.1583-1585, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00504101

O. Koyejo and R. A. Poldrack, Decoding cognitive processes from functional MRI, NIPS Workshop on Machine Learning for Interpretable Neuroimaging, pp.5-10, 2013.

N. Kriegeskorte, R. Goebel, and P. Bandettini, Information-based functional brain mapping, Proceedings of the National Academy of Sciences, vol.103, issue.10, pp.3863-3868, 2006.
DOI : 10.1073/pnas.0600244103

URL : http://www.pnas.org/content/103/10/3863.full.pdf

H. W. Kuhn, Variants of the Hungarian method for assignment problems, Naval Research Logistics, vol.3, issue.4, pp.253-258, 1956.

J. Lafferty, A. Mccallum, and F. C. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the International Conference on Machine Learning, pp.282-289, 2001.

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, Neural architectures for named entity recognition, Proceedings of NAACL, pp.260-270, 2016.
DOI : 10.18653/v1/n16-1030

URL : https://doi.org/10.18653/v1/n16-1030

P. Lauterbur, Image formation by induced local interactions: examples employing nuclear magnetic resonance, Nature, p.190, 1973.

R. Leblond, F. Pedregosa, and S. Lacoste-julien, ASAGA: asynchronous parallel SAGA, Proceedings of the International Conference on Artificial Intelligence and Statistics, pp.46-54, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01407833

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015.

O. Levy and Y. Goldberg, Neural word embedding as implicit matrix factorization, Advances in Neural Information Processing Systems, pp.2177-2185, 2014.

M. A. Lindquist, J. M. Loh, L. Y. Atlas, and T. D. Wager, Modeling the hemodynamic response function in fMRI: efficiency, bias and mis-modeling, NeuroImage, vol.45, issue.1, pp.187-198, 2009.
DOI : 10.1016/j.neuroimage.2008.10.065

URL : http://europepmc.org/articles/pmc3318970?pdf=render

S. Linnainmaa, The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors (Doctoral dissertation, 1970.

J. Loula, G. Varoquaux, and B. Thirion, Decoding fMRI activity in the time domain improves classification performance, 2017.
DOI : 10.1016/j.neuroimage.2017.08.018

URL : https://hal.archives-ouvertes.fr/hal-01576641

Y. Lu, P. Dhillon, D. P. Foster, and L. Ungar, Faster ridge regression via the subsampled randomized hadamard transform, Advances in Neural Information Processing Systems, pp.369-377, 2013.

T. Luong, H. Pham, and C. D. Manning, Effective approaches to attention-based neural machine translation, Proceedings of EMNLP, pp.1412-1421, 2015.
DOI : 10.18653/v1/d15-1166

URL : https://doi.org/10.18653/v1/d15-1166

X. Ma and E. Hovy, End-to-end sequence labeling via bidirectional LSTM-CNNs-CRF, Proceedings of ACL, pp.1064-1074, 2016.
DOI : 10.18653/v1/p16-1101

URL : https://doi.org/10.18653/v1/p16-1101

M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi, Nonlocal transform-domain filter for volumetric data denoising and reconstruction, IEEE Transactions on Image Processings, vol.22, issue.1, pp.119-133, 2013.
DOI : 10.1109/tip.2012.2210725

J. Mairal, Optimization with first-order surrogate functions, Proceedings of the International Conference on Machine Learning, pp.783-791, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00822229

J. Mairal, Stochastic majorization-minimization algorithms for large-scale optimization, Advances in Neural Information Processing Systems, pp.2283-2291, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00835840

J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing, Foundations and Trends in Computer Graphics and Vision, vol.8, issue.2-3, pp.85-283, 2014.
DOI : 10.1561/0600000058

URL : https://hal.archives-ouvertes.fr/hal-01081139

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, Journal of Machine Learning Research, vol.11, pp.19-60, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00408716

J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, and F. R. Bach, Supervised Dictionary Learning, Advances in Neural Information Processing Systems, pp.1033-1040, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00322431

S. Makni, J. Idier, T. Vincent, B. Thirion, G. Dehaene-lambertz et al., A fully Bayesian approach to the parcel-based detection-estimation of brain activity in fMRI, NeuroImage, vol.41, issue.3, pp.941-969, 2008.
URL : https://hal.archives-ouvertes.fr/cea-00333624

M. Mardani, G. Mateos, and G. B. Giannakis, Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors, IEEE Transactions on Signal Processing, vol.63, issue.10, pp.2663-2677, 2015.
DOI : 10.1109/tsp.2015.2417491

URL : http://arxiv.org/pdf/1404.4667

A. F. Martins and R. F. Astudillo, From softmax to sparsemax: a sparse model of attention and multi-label classification, Proceedings of the International Conference on Machine Learning, pp.1614-1623, 2016.

M. J. Mckeown, S. Makeig, G. G. Brown, T. P. Jung, S. S. Kindermann et al., Analysis of fMRI data by blind separation into independent spatial components, Human Brain Mapping, vol.6, issue.3, pp.160-188, 1998.
DOI : 10.1002/(sici)1097-0193(1998)6:3<160::aid-hbm5>3.3.co;2-r

URL : http://papers.cnl.salk.edu/PDFs/Analysis%20of%20FMRI%20Data%20by%20Blind%20Separation%20Into%20Independent%20Spatial%20Components%201998-3633.pdf

A. Mensch and M. Blondel, Differentiable dynamic programming for structured prediction and attention, Proceedings of the International Conference on Machine Learning (ICML), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01809550

A. Mensch, J. Mairal, D. Bzdok, B. Thirion, and G. Varoquaux, Learning neural representations of human cognition across many fMRI studies, Advances in Neural Information Processing Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01626823

A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux, Dictionary learning for massive matrix factorization, Proceedings of the International Conference on Machine Learning (ICML), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01308934

A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux, Extracting universal representations of cognition across brain-imaging studies, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01874713

A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux, Stochastic Subsampling for factorizing huge matrices, IEEE Transactions on Signal Processing, vol.66, issue.1, pp.113-128, 2018.
DOI : 10.1109/tsp.2017.2752697

URL : https://hal.archives-ouvertes.fr/hal-01431618

A. Mensch, G. Varoquaux, and B. Thirion, Compressed online dictionary learning for fast fMRI decomposition, Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI), 2016.
DOI : 10.1109/isbi.2016.7493501

URL : https://hal.archives-ouvertes.fr/hal-01271033

O. Meshi, M. Mahdavi, and A. G. Schwing, Smooth and strong: MAP inference with linear convergence, Adavnces in Neural Information Processing Systems, 2015.

M. Métivier, Semimartingales: a course on stochastic processes, 1982.

V. Michel, A. Gramfort, G. Varoquaux, E. Eger, and B. Thirion, Total Variation regularization for fMRI-based prediction of behavior, IEEE Transactions on Medical Imaging, vol.30, issue.7, pp.1328-1340, 2011.

C. Michelot, A finite algorithm for finding the projection of a point onto the canonical simplex of R n, Journal of Optimization Theory and Applications, vol.50, issue.1, pp.195-200, 1986.

M. P. Milham, D. P. Fair, M. P. Mennes, and S. H. Mostofsky, The ADHD-200 consortium: a model to advance the translational potential of neuroimaging in clinical neuroscience, Frontiers in Systems Neuroscience, p.6, 2012.

D. Molchanov, A. Ashukha, and D. Vetrov, Variational Dropout Sparsifies Deep Neural Networks, Proceedings of the International Conference on Machine Learning, pp.2498-2507, 2017.

J. M. Moran, E. Jolly, and J. P. Mitchell, Social-cognitive deficits in normal aging, The Journal of Neuroscience, vol.32, pp.5553-5561, 2012.

J. Moreau, Proximité et dualité dans un espace hilbertien, vol.93, pp.273-299, 1965.

J. Mourão-miranda, A. L. Bokde, C. Born, H. Hampel, and M. Stetter, Classifying brain states and determining the discriminating activation patterns: support vector machine on functional MRI data, NeuroImage, vol.28, issue.4, pp.980-995, 2005.

J. A. Mumford, B. O. Turner, F. G. Ashby, and R. A. Poldrack, Deconvolving BOLD activation in event-related designs for multivoxel pattern classification analyses, NeuroImage, vol.59, issue.3, pp.2636-2643, 2012.

Y. Nesterov, Smooth minimization of non-smooth functions, Mathematical Programming, vol.103, issue.1, pp.127-152, 2005.

A. Newell, You can't play 20 questions with nature and win: Projective comments on the papers of this symposium, Visual Information Processing, pp.1-26, 1973.

B. Neyshabur, Implicit Regularization in Deep Learning (Doctoral dissertation, 2017.

B. Ng and R. Abugharbieh, Generalized sparse regularization with application to fMRI brain decoding, Proceedings of the International Conference on Information Processing in Medical Imaging, pp.612-623, 2011.

V. Niculae and M. Blondel, A regularized framework for sparse and structured neural attention, Adavnces in Neural Information Processing Systems, pp.3340-3350, 2017.

V. Niculae, A. F. Martins, M. Blondel, and C. Cardie, SparseMAP: differentiable sparse structured inference, Proceedings of the International Conference on Machine Learning, 2018.

K. B. Nooner, S. J. Colcombe, R. H. Tobe, M. Mennes, M. M. Benedict et al., The NKI Rockland Sample: a Model for Accelerating the Pace of Discovery Science in Psychiatry, Frontiers in Neuroscience, vol.6, p.152, 2012.

A. Nowak, D. Folqué, and J. Bruna, Divide and conquer networks, Proceedings of the International Conference on Learning Representation, 2018.

S. Ogawa, T. Lee, A. R. Kay, and D. W. Tank, Brain magnetic resonance imaging with contrast dependent on blood oxygenation, Proceedings of the National Academy of Sciences, vol.87, issue.24, pp.9868-9872, 1990.

B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, vol.37, pp.3311-3325, 1997.

J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, 1970.

S. J. Pan and Q. Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, pp.1345-1359, 2010.

P. Orfanos, D. Michel, V. Schwartz, Y. Pinel, P. Moreno et al., The Brainomics/Localizer database, NeuroImage, vol.144, pp.309-314, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01213448

A. Paszke, S. Gross, S. Chintala, and G. Chanan, Pytorch: tensors and dynamic neural networks in Python with strong GPU acceleration, 2017.

J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference, 1988.

B. A. Pearlmutter, Fast exact multiplication by the Hessian, Neural computation, vol.6, issue.1, pp.147-160, 1994.

F. Pedregosa, Feature extraction and supervised learning on fMRI: from practice to theory (Doctoral dissertation, 2015.

F. Pedregosa, M. Eickenberg, P. Ciuciu, B. Thirion, and A. Gramfort, Data-driven HRF estimation for encoding and decoding models, NeuroImage, vol.104, pp.209-220, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00952554

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

Y. Peng, D. Meng, Z. Xu, C. Gao, Y. Yang et al., Decomposable nonlocal tensor dictionary learning for multispectral image denoising, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2949-2956, 2014.

J. Pennington, R. Socher, and C. D. Manning, Glove: global vectors for word representation, Proceeding of the Conference on EMNLP, pp.1532-1575, 2014.

F. Pereira, T. Mitchell, and M. Botvinick, Machine learning classifiers and fMRI: a tutorial overview, NeuroImage, vol.45, issue.1, pp.199-209, 2009.

S. E. Petersen, P. T. Fox, M. I. Posner, M. Mintun, and M. E. Raichle, Positron emission tomographic studies of the processing of singe words, Journal of cognitive neuroscience, vol.1, issue.2, pp.153-170, 1989.

M. Pilanci and M. Wainwright, Iterative hessian sketch: fast and accurate solution approximation for constrained least squares, Journal of Machine Learning Research, vol.17, pp.1-33, 2015.

P. Pinel and S. Dehaene, Genetic and environmental contributions to brain activation during calculation, NeuroImage, vol.81, pp.306-316, 2013.
URL : https://hal.archives-ouvertes.fr/inserm-00832572

P. Pinel, B. Thirion, S. Meriaux, A. Jobert, J. Serres et al.,

S. Dehaene, Fast reproducible identification and largescale databasing of individual functional cognitive networks, BMC neuroscience, vol.8, p.91, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00784462

P. Pinel, B. Thirion, S. Meriaux, A. Jobert, J. Serres et al.,

S. Dehaene, Fast reproducible identification and largescale databasing of individual functional cognitive networks, BMC Neuroscience, vol.8, p.91, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00784462

A. L. Pinho, A. Amadon, T. Ruest, M. Fabre, E. Dohmatob et al., Individual Brain Charting, a high resolution fMRI dataset for cognitive mapping, Scientific Data, vol.5, p.180105, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01817528

R. A. Poldrack, C. I. Baker, J. Durnez, K. J. Gorgolewski, P. M. Matthews et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews Neuroscience, vol.18, issue.2, pp.115-126, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01896468

R. A. Poldrack, D. M. Barch, J. Mitchell, T. D. Wager, A. D. Wagner et al., Toward open sharing of task-based fMRI data: the OpenfMRI project, Frontiers in Neuroinformatics, vol.7, p.12, 2013.

R. A. Poldrack, J. Clark, E. Pare-blagoev, D. Shohamy, J. Creso-moyano et al., Interactive memory systems in the human brain, Nature, vol.414, issue.6863, pp.546-550, 2001.

R. A. Poldrack, E. Congdon, W. Triplett, K. J. Gorgolewski, K. Karlsgodt et al., A phenomewide examination of neural and cognitive function, Scientific Data, vol.3, p.160110, 2016.

R. A. Poldrack, Y. O. Halchenko, and S. J. Hanson, Decoding the large-scale structure of brain function by classifying mental states across individuals, Psychological Science, vol.20, issue.11, pp.1364-1372, 2009.

R. A. Poldrack, T. Nichols, and J. A. Mumford, Handbook of functional MRI data analysis, 2011.

R. A. Poldrack and T. Yarkoni, From brain maps to cognitive ontologies: informatics and the search for mental structure, Annual Review of Psychology, vol.67, issue.1, pp.587-612, 2016.

L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, vol.77, pp.257-286

M. Rahim, B. Thirion, and G. Varoquaux, Population-shrinkage of covariance to estimate better brain functional connectivity, International Conference on Medical Image Computing and ComputerAssisted Intervention, pp.460-468, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01547612

M. E. Raichle and M. A. Mintun, Brain work and brain imaging, Annual Review of Neuroscience, vol.29, pp.449-476, 2006.

G. Raskutti and M. Mahoney, Statistical and algorithmic perspectives on randomized sketching for ordinary least-squares, Proceedings of the International Conference on Machine Learning, pp.617-625, 2015.

M. Razaviyayn, M. Hong, and Z. Luo, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM Journal on Optimization, vol.23, issue.2, pp.1126-1153, 2013.

B. Recht and C. Ré, Parallel stochastic gradient algorithms for large-scale matrix completion, Mathematical Programming Computation, vol.5, issue.2, pp.201-226, 2013.

J. D. Rennie and N. Srebro, Fast maximum margin matrix factorization for collaborative prediction, Proceedings of the International Conference on Machine Learning, pp.713-719, 2005.

A. Rizk-jackson, A. R. Aron, and R. Poldrack, Classification learning and stop-signal (one year test-retest

V. Rokhlin, A. Szlam, and M. Tygert, A randomized algorithm for principal component analysis, SIAM Journal on Matrix Analysis and Applications, vol.31, issue.3, pp.1100-1124, 2009.

A. K. Roy, Z. Shehzad, D. S. Margulies, A. C. Kelly, L. Q. Uddin et al., Functional connectivity of the human amygdala using resting state fMRI, NeuroImage, vol.45, issue.2, pp.614-626, 2009.

M. R. Sabuncu, B. D. Singer, B. Conroy, R. E. Bryan, P. J. Ramadge et al., Function-based intersubject alignment of human cortical anatomy, Cerebral Cortex, vol.20, issue.1, pp.130-140, 2009.

H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.26, pp.43-49, 1978.

G. Salimi-khorshidi, S. M. Smith, J. R. Keltner, T. D. Wager, and T. E. Nichols, Meta-analysis of neuroimaging data: a comparison of image-based and coordinate-based pooling of studies, NeuroImage, vol.45, issue.3, pp.810-823, 2009.

T. Sarlos, Improved approximation algorithms for large matrices via random projections, Proceedings of the IEEE Symposium on Foundations of Computer Science, pp.143-152, 2006.

R. Saxe, M. Brett, and N. Kanwisher, Divide and conquer: a defense of functional localizers, NeuroImage, vol.30, issue.4, pp.1097-1099, 2006.

T. Schonberg, C. Fox, J. A. Mumford, C. Congdon, C. Trepel et al., Decreasing ventromedial prefrontal cortex activity during sequential risk-taking: an fMRI investigation of the balloon analog risk task, Frontiers in Neuroscience, vol.6, p.80, 2012.

Y. Schwartz, B. Thirion, and G. Varoquaux, Mapping paradigm ontologies to and from the brain, Advances in Neural Information Processing Systems, pp.1673-1681, 2013.

M. A. Shafto, L. K. Tyler, M. Dixon, J. R. Taylor, J. B. Rowe et al., The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing, BMC Neurology, vol.14, p.204, 2014.

D. A. Smith and J. Eisner, Minimum risk annealing for training log-linear models, Proceedings of COLING/ACL, pp.787-794, 2006.

S. M. Smith, P. T. Fox, K. L. Miller, D. C. Glahn, P. M. Fox et al., Correspondence of the brain's functional architecture during activation and rest, Proceedings of the National Academy of Sciences, vol.106, issue.31, pp.13040-13045, 2009.

S. M. Smith, A. Hyvärinen, G. Varoquaux, K. L. Miller, and C. F. Beckmann, Group-pca for very large fMRI datasets, NeuroImage, vol.101, p.738, 2014.

S. M. Smith, T. E. Nichols, D. Vidaurre, A. M. Winkler, T. E. Behrens et al., A positive-negative mode of population covariation links brain connectivity, demographics and behavior, Nature Neuroscience, vol.18, issue.11, p.1565, 2015.

A. Soltani-farani, H. R. Rabiee, and S. A. Hosseini, Spatialaware dictionary learning for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, vol.53, issue.1, pp.527-541, 2015.

N. Srebro, J. Rennie, and T. S. Jaakkola, Maximum-margin matrix factorization, Advances in Neural Information Processing Systems, pp.1329-1336, 2004.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

V. Stoyanov and J. Eisner, Minimum-risk training of approximate CRF-based NLP systems, Proceedings of NAACL, pp.120-130, 2012.

C. Sudlow, J. Gallacher, N. Allen, V. Beral, P. Burton et al.,

M. Landray, UK BioBank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age, PLoS Medicine, vol.12, issue.3, pp.1-10, 2015.

R. A. Sulanke, Objects counted by the central Delannoy numbers, Journal of Integer Sequences, vol.6, issue.1, p.3, 2003.

C. Sutton and A. Mccallum, An introduction to conditional random fields. Foundations and Trends on Machine Learning, vol.4, pp.267-373, 2012.

Z. Szabó, B. Póczos, and A. Lorincz, Online group-structured dictionary learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2865-2872, 2011.

G. Takács, I. Pilászy, B. Németh, and D. Tikk, Scalable collaborative filtering approaches for large recommender systems, Journal of Machine Learning Research, vol.10, pp.623-656, 2009.

I. Tavor, O. P. Jones, R. B. Mars, S. M. Smith, T. E. Behrens et al., Task-free MRI predicts individual differences in brain activity during task performance, Science, vol.352, issue.6282, pp.216-220, 2016.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), vol.58, issue.1, pp.267-288, 1996.

S. M. Tom, C. R. Fox, C. Trepel, and R. A. Poldrack, The neural basis of loss aversion in decision-making under risk, Science, issue.5811, pp.515-518, 2007.

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research, vol.6, pp.1453-1484, 2005.

J. A. Turner and A. R. Laird, The cognitive paradigm ontology: design and application, Neuroinformatics, vol.10, issue.1, pp.57-66, 2012.

M. R. Uncapher, J. B. Hutchinson, and A. D. Wagner, Dissociable effects of top-down and bottom-up attention during episodic encoding, The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, issue.35, pp.12613-12628, 2011.

W. R. Uttal, The new phrenology: the limits of localizing cognitive processes in the brain, 2001.

L. Vagharchakian, G. Dehaene-lambertz, C. Pallier, and S. Dehaene, A temporal bottleneck in the language comprehension network, The Journal of Neuroscience, vol.32, pp.9089-9102, 2012.

A. W. Van-der-vaart, Asymptotic statistics, 2000.

D. C. Van-essen, K. Ugurbil, E. Auerbach, D. Barch, T. E. Behrens et al., The Human Connectome Project: a data acquisition perspective, NeuroImage, vol.62, issue.4, pp.2222-2231, 2012.

G. Vane, First results from the airborne visible/infrared imaging spectrometer (AVIRIS), Annual Technical Symposium of the International Society of Optics and Photonic, pp.166-175, 1987.

G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, and B. Thirion, Multi-subject dictionary learning to segment an atlas of brain spontaneous activity, Proceedings of the International Conference on Information Processing in Medical Imaging, vol.22, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00588898

S. Verdu and H. V. Poor, Abstract dynamic programming models under commutativity conditions, SIAM Journal on Control and Optimization, vol.25, issue.4, pp.990-1006, 1987.

A. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory, vol.13, issue.2, pp.260-269, 1967.

S. Wager, S. Wang, and P. S. Liang, Dropout training as adaptive regularization, Advances in Neural Information Processing Systems, pp.351-359, 2013.

T. D. Wager, L. Y. Atlas, M. A. Lindquist, M. Roy, C. Woo et al., An fMRI-based neurologic signature of physical pain, New England Journal of Medicine, vol.368, issue.15, pp.1388-1397, 2013.

T. D. Wager, M. L. Davidson, B. L. Hughes, M. A. Lindquist, and K. N. Ochsner, Prefrontal-subcortical pathways mediating successful emotion regulation, Neuron, vol.59, pp.1037-1050, 2008.

M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference, Foundations and Trends on Machine Learning, vol.1, issue.1-2, pp.1-305, 2008.

S. Weng, J. L. Wiggins, S. J. Peltier, M. Carrasco, S. Risi et al., Alterations of resting state functional connectivity in the default network in adolescents with autism spectrum disorders, Brain Research, vol.1313, pp.202-214, 2010.

S. J. Wright, Coordinate descent algorithms. Mathematical Programming, vol.151, pp.3-34, 2015.

T. Wu, L. Wang, Y. Chen, C. Zhao, K. Li et al., Changes of functional connectivity of the motor network in the resting state in Parkinson's disease, Neuroscience Letters, vol.460, issue.1, pp.6-10, 2009.

G. Xue, A. R. Aron, and R. A. Poldrack, Common neural substrates for inhibition of spoken and manual responses, Cerebral Cortex, vol.18, pp.1923-1932, 2008.

G. Xue and R. A. Poldrack, The neural substrates of visual perceptual learning of words: implications for the visual word form area hypothesis, Journal of Cognitive Neuroscience, vol.19, pp.1643-1655, 2007.

Y. Xue, X. Liao, L. Carin, and B. Krishnapuram, Multi-task learning for classification with dirichlet process priors, Journal of Machine Learning Research, vol.8, pp.35-63, 2007.

O. Yamashita, M. Sato, T. Yoshioka, F. Tong, and Y. Kamitani, Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns, NeuroImage, vol.42, issue.4, pp.1414-1429, 2008.

T. Yarkoni, R. A. Poldrack, T. E. Nichols, D. C. Van-essen, and T. D. Wager, Large-scale automated synthesis of human functional neuroimaging data, Nature Methods, vol.8, issue.8, pp.665-670, 2011.
DOI : 10.1038/nmeth.1635

URL : http://europepmc.org/articles/pmc3146590?pdf=render

T. B. Yeo, F. M. Krienen, J. Sepulcre, M. R. Sabuncu, D. Lashkari et al., The organization of the human cerebral cortex estimated by intrinsic functional connectivity, Journal of Neurophysiology, vol.106, issue.3, pp.1125-1165, 2011.

H. Yu, C. Hsieh, and I. Dhillon, Scalable coordinate descent approaches to parallel matrix factorization for recommender The value DTW ? (?) v N A ,N B (?) can be computed in O(N A N B ) time. Applying the derivations of Section 8.3.3 and Section 8.3.4 to this specific DAG, we can compute ?DTW ? (?), ?DTW ? (?), Z and ? 2 DTW ? (?)Z with the same complexity. The procedures, with appropriate handling of the edge cases, are summarized in Algorithm 9 and 10, respectively. Note that when ? is the negative entropy, DTW ? (?) is known as soft-DTW (Cuturi and Blondel, 2012.