A. Agarwal and L. Bottou, A lower bound for the optimization of finite sums, Proceedings of the International Conference on Machine Learning (ICML), 2015.

M. Aharon, M. Elad, and A. Bruckstein, <tex>$rm K$</tex>-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation, IEEE Transactions on Signal Processing, vol.54, issue.11, pp.4311-4322, 2006.
DOI : 10.1109/TSP.2006.881199

S. Ahn, J. A. Fessler, D. Blatt, and A. O. Hero, Convergent incremental optimization transfer algorithms: Application to tomography, IEEE Transactions on Medical Imaging, vol.25, issue.3, pp.283-296, 2006.

R. K. Ahuja, T. L. Magnanti, and J. Orlin, Network Flows, 1993.
DOI : 10.21236/ADA594171

. Akaike, Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory, pp.267-281, 1973.

L. Anselmi, C. Rosasco, T. Tan, and . Poggio, Deep convolutional networks are hierarchical kernel machines, 2015.

F. Bach, Bolasso, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390161
URL : https://hal.archives-ouvertes.fr/hal-00271289

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with Sparsity-Inducing Penalties, Machine Learning, pp.1-106, 2012.
DOI : 10.1561/2200000015
URL : https://hal.archives-ouvertes.fr/hal-00613125

F. Bach, Exploring large feature spaces with hierarchical multiple kernel learning, Advances in Neural Information Processing Systems (NIPS), 2008.
URL : https://hal.archives-ouvertes.fr/hal-00319660

F. Bach, Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research (JMLR), vol.18, issue.19, pp.1-53, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01098505

F. Bach and M. I. Jordan, Predictive low-rank decomposition for kernel methods, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102356
URL : http://www.cs.berkeley.edu/~jordan/papers/bach-jordan-icml05.pdf

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542
URL : http://ie.technion.ac.il/%7Ebecka/papers/finalicassp2009.pdf

A. Beck and L. Tetruashvili, On the Convergence of Block Coordinate Descent Type Methods, SIAM Journal on Optimization, vol.23, issue.4, pp.2037-2060, 2013.
DOI : 10.1137/120887679

A. J. Bell and T. J. Sejnowski, The ???independent components??? of natural scenes are edge filters, Vision Research, vol.37, issue.23, pp.3327-3338, 1997.
DOI : 10.1016/S0042-6989(97)00121-1

E. Bernard, L. Jacob, J. Mairal, and J. Vert, Efficient RNA isoform identification and quantification from RNA-Seq data with network flows, Bioinformatics, vol.30, issue.17, pp.30-2447, 2014.
DOI : 10.1093/bioinformatics/btu317
URL : https://hal.archives-ouvertes.fr/hal-00803134

E. Bernard, L. Jacob, J. Mairal, E. Viara, and J. Vert, A convex formulation for joint RNA isoform detection and quantification from multiple RNA-seq samples, BMC Bioinformatics, vol.31, issue.1, p.262, 2015.
DOI : 10.1038/nbt.2450
URL : https://hal.archives-ouvertes.fr/hal-01123141

D. P. Bertsekas, Network Optimization: Continuous and Discrete Models, Athena Scientific, 1998.

D. P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999.

A. Bietti and J. Mairal, Group invariance and stability to deformations of deep convolutional representations, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01536004

A. Bietti and J. Mairal, Stochastic optimization with variance reduction for infinite datasets with finite-sum structure, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01375816

D. Blatt, A. O. Hero, and H. Gauchman, A Convergent Incremental Gradient Method with a Constant Step Size, SIAM Journal on Optimization, vol.18, issue.1, pp.29-51, 2007.
DOI : 10.1137/040615961
URL : http://www.eecs.umich.edu/~hero/Preprints/AveragedGradientVer5.pdf

L. Bo, K. Lai, X. Ren, and D. Fox, Object recognition with hierarchical kernel descriptors, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995719
URL : http://www.cs.washington.edu/homes/lfb/paper/cvpr11.pdf

D. Böhning and B. G. Lindsay, Monotonicity of quadratic-approximation algorithms, Annals of the Institute of Statistical Mathematics, vol.11, issue.4, pp.641-663, 1988.
DOI : 10.1007/BF00049423

K. Borgwardt and H. Kriegel, Shortest-Path Kernels on Graphs, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005.
DOI : 10.1109/ICDM.2005.132
URL : http://cbio.ensmp.fr/~jvert/svn/bibli/local/Borgwardt2005Shortest-Path.pdf

J. M. Borwein and A. S. Lewis, Convex analysis and nonlinear optimization: Theory and examples, 2006.

L. Bottou, Online algorithms and stochastic approximations, Online Learning and Neural Networks, 1998.

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning

O. Bousquet and L. Bottou, The tradeoffs of large scale learning, Advances in Neural Information Processing Systems (NIPS), 2008.

Y. Boykov, O. Veksler, and R. Zabih, Efficient approximate energy minimization via graph cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.20, issue.12, pp.1222-1239, 2001.
DOI : 10.1109/34.969114
URL : http://www.csd.uwo.ca/~yuri/Papers/iccv99.pdf

D. S. Broomhead and D. Lowe, Radial basis functions, multi-variable functional interpolation and adaptive networks, 1988.

J. Bruna and S. Mallat, Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013.
DOI : 10.1109/TPAMI.2012.230
URL : http://arxiv.org/pdf/1203.1513

A. Buades, B. Coll, and J. Morel, A Review of Image Denoising Algorithms, with a New One, Multiscale Modeling & Simulation, vol.4, issue.2, pp.490-530, 2005.
DOI : 10.1137/040616024
URL : https://hal.archives-ouvertes.fr/hal-00271141

R. H. Byrd, J. Nocedal, and F. Oztoprak, An inexact successive quadratic approximation method for L-1 regularized optimization, Mathematical Programming, pp.375-396, 2015.
DOI : 10.1198/tech.2006.s352

E. J. Candès, M. Wakin, and S. P. Boyd, Enhancing Sparsity by Reweighted ??? 1 Minimization, Journal of Fourier Analysis and Applications, vol.7, issue.3, pp.877-905, 2008.
DOI : 10.1007/978-1-4757-4182-7

O. Cappé and E. Moulines, On-line expectation-maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.11, issue.3, pp.593-613, 2009.
DOI : 10.1007/978-1-4684-0192-9

C. M. Carvalho, J. Chang, J. E. Lucas, J. R. Nevins, Q. Wang et al., High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics, Journal of the American Statistical Association, vol.103, issue.484, pp.1438-1456, 2008.
DOI : 10.1198/016214508000000869
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3017385/pdf

S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic Decomposition by Basis Pursuit, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.33-61, 1999.
DOI : 10.1137/S1064827596304010
URL : http://www-stat.stanford.edu/~donoho/Reports/1995/30401.pdf

X. Chen and M. Fukushima, Proximal quasi-Newton methods for nondifferentiable convex optimization, Mathematical Programming, vol.85, issue.2, pp.313-334, 1999.
DOI : 10.1007/s101070050059
URL : http://halo.kuamp.kyoto-u.ac.jp/zagato/member/staff/fuku/./papers/proxNewton.ps.Z

Y. Cho and L. K. Saul, Large-Margin Classification in Infinite Neural Networks, Neural Computation, vol.10, issue.10, pp.2678-2697, 2010.
DOI : 10.1109/TIT.2002.808136
URL : http://www.cse.ucsd.edu/users/yoc002/paper/neco_arccos.pdf

A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. Lecun, The loss surface of multilayer networks, International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.

D. Ciresan, U. Meier, and J. Schmidhuber, Multi-column deep neural networks for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248110
URL : http://www.idsia.ch/idsiareport/IDSIA-04-12.pdf

J. F. Claerbout and F. Muir, ROBUST MODELING WITH ERRATIC DATA, GEOPHYSICS, vol.38, issue.5, pp.826-844, 1973.
DOI : 10.1190/1.1440378

M. Collins, R. Schapire, and Y. Singer, Logistic regression, AdaBoost and Bregman distances, Machine Learning, pp.253-285, 2002.

M. Cuturi and J. Vert, The context-tree kernel for strings, Neural Networks, vol.18, issue.8, pp.1111-1123, 2005.
DOI : 10.1016/j.neunet.2005.07.010
URL : https://hal.archives-ouvertes.fr/hal-00433583

M. Cuturi, J. Vert, O. Birkenes, and T. Matsui, A Kernel for Time Series Based on Global Alignments, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, 2007.
DOI : 10.1109/ICASSP.2007.366260

A. Damianou and N. Lawrence, Deep Gaussian processes, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2013.

A. Daniely, R. Frostig, and Y. Singer, Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, Advances in Neural Information Processing Systems (NIPS), 2016.

G. B. Dantzig, Maximization of a linear function of variables subject to linear inequalities

I. Daubechies, Orthonormal bases of compactly supported wavelets, Communications on Pure and Applied Mathematics, vol.34, issue.7, pp.909-996, 1988.
DOI : 10.1007/978-3-642-61987-8

I. Daubechies, M. Defrise, and C. Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics, vol.58, issue.11, pp.1413-1457, 2004.
DOI : 10.1002/0471221317
URL : http://onlinelibrary.wiley.com/doi/10.1002/cpa.20042/pdf

S. V. David and J. L. Gallant, Predicting neuronal responses during natural vision, Network: Computation in Neural Systems, vol.19, issue.4, pp.239-260, 2005.
DOI : 10.1016/0042-6989(93)90248-U
URL : http://www.ece.umd.edu/~svd/pdf/david_gallant_prediction_2005.pdf

A. J. Defazio, T. S. Caetano, and J. Domke, Finito: A faster, permutable incremental gradient method for big data problems, Proceedings of the International Conference on Machine Learning (ICML), 2014.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

S. , D. Pietra, V. D. Pietra, and J. Lafferty, Duality and auxiliary functions for Bregman distances, 2001.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B, vol.39, issue.1, pp.1-38, 1977.

O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol.110, issue.3, pp.37-75, 2014.
DOI : 10.1007/978-3-642-82118-9
URL : http://www.ecore.be/DPs/dp_1297333979.pdf

C. Dong, C. C. Loy, K. He, and X. Tang, Learning a Deep Convolutional Network for Image Super-Resolution, Proceedings of the European Conference on Computer Vision (ECCV), 2014.
DOI : 10.1007/978-3-319-10593-2_13
URL : http://www.eecs.qmul.ac.uk/~ccloy/files/eccv_2014_deepresolution.pdf

C. Dong, C. C. Loy, K. He, and X. Tang, Image Super-Resolution Using Deep Convolutional Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.2, pp.295-307, 2016.
DOI : 10.1109/TPAMI.2015.2439281
URL : http://arxiv.org/pdf/1501.00092

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research (JMLR), vol.10, pp.2899-2934, 2009.

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression, Annals of Statistics, vol.32, issue.2, pp.407-499, 2004.

M. A. Efroymson, Multiple regression analysis Mathematical methods for digital computers, pp.191-203, 1960.

M. Elad, Sparse and Redundant Representations, 2010.
DOI : 10.1007/978-1-4419-7011-4
URL : https://hal.archives-ouvertes.fr/inria-00568893

M. Elad and M. Aharon, Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries, IEEE Transactions on Image Processing, vol.15, issue.12, pp.3736-3745, 2006.
DOI : 10.1109/TIP.2006.881969
URL : http://www.cs.technion.ac.il/~elad/publications/journals/2005/KSVD_Denoising_IEEE_TIP.pdf

H. Erdogan and J. A. Fessler, Ordered subsets algorithms for transmission tomography, Physics in Medicine and Biology, vol.44, issue.11, pp.2835-2851, 1999.
DOI : 10.1088/0031-9155/44/11/311

M. A. Figueiredo and R. D. Nowak, An EM algorithm for wavelet-based image restoration, IEEE Transactions on Image Processing, vol.12, issue.8, pp.906-916, 2003.
DOI : 10.1109/TIP.2003.814255
URL : http://cmc.rice.edu/docs/docs/Fig2002May1ImageResto.ps

S. Fine and K. Scheinberg, Efficient svm training using low-rank kernel representations, Journal of Machine Learning Research (JMLR), vol.2, pp.243-264, 2001.

L. R. Ford and D. R. Fulkerson, Maximal flow through a network, Journal canadien de math??matiques, vol.8, issue.0, pp.399-404, 1956.
DOI : 10.4153/CJM-1956-045-5

M. P. Friedlander and M. Schmidt, Hybrid Deterministic-Stochastic Methods for Data Fitting, SIAM Journal on Scientific Computing, vol.34, issue.3, pp.1380-1405, 2012.
DOI : 10.1137/110830629
URL : https://hal.archives-ouvertes.fr/inria-00626571

J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, 2001.

K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.40, issue.4, pp.62-658, 1979.
DOI : 10.1007/BF00344251

M. Fukushima and L. Qi, A Globally and Superlinearly Convergent Algorithm for Nonsmooth Convex Minimization, SIAM Journal on Optimization, vol.6, issue.4, pp.1106-1120, 1996.
DOI : 10.1137/S1052623494278839

B. Gärtner, M. Jaggi, and C. Maria, An exponential lower bound on the complexity of regularization paths, Journal of Computational Geometry (JoCG), vol.3, issue.1, pp.168-195, 2012.

G. Gasso, A. Rakotomamonjy, and S. Canu, Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming, IEEE Transactions on Signal Processing, vol.57, issue.12, pp.4686-4698, 2009.
DOI : 10.1109/TSP.2009.2026004
URL : https://hal.archives-ouvertes.fr/hal-00439453

S. Ghadimi and G. Lan, Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming, SIAM Journal on Optimization, vol.23, issue.4, 2013.
DOI : 10.1137/120880811
URL : http://arxiv.org/pdf/1309.5549

J. Giesen, M. Jaggi, and S. Laue, Approximating parameterized convex optimization problems, Algorithms -ESA, Lectures Notes Comp. Sci, 2010.
DOI : 10.1145/2390176.2390186
URL : http://www.m8j.net/math/approxPaths.pdf

A. V. Goldberg, An Efficient Implementation of a Scaling Minimum-Cost Flow Algorithm, Journal of Algorithms, vol.22, issue.1, pp.1-29, 1997.
DOI : 10.1006/jagm.1995.0805

A. V. Goldberg and R. E. Tarjan, A new approach to the maximum flow problem, Proc. of ACM Symposium on Theory of Computing, pp.136-146, 1986.

I. J. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio, Maxout networks, Proceedings of the International Conference on Machine Learning (ICML), 2013.

A. Gordo, J. Almazan, J. Revaud, and D. Larlus, Deep Image Retrieval: Learning Global Representations for Image Search, Proceedings of the European Conference on Computer Vision (ECCV), 2016.
DOI : 10.1109/CVPR.2014.180
URL : http://arxiv.org/pdf/1604.01325

R. M. Gower, D. Goldfarb, and P. Richtárik, Stochastic block BFGS: Squeezing more curvature out of data, Proceedings of the International Conference on Machine Learning (ICML), 2016.

Y. Grandvalet and S. Canu, Outcomes of the equivalence of adaptive ridge with least absolute shrinkage, Advances in Neural Information Processing Systems (NIPS), 1999.

M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, version 2.1, 2014.
DOI : 10.1007/0-387-30528-9_7
URL : http://www.stanford.edu/~boyd/papers/pdf/disc_cvx_prog.pdf

O. Güler, New Proximal Point Algorithms for Convex Minimization, SIAM Journal on Optimization, vol.2, issue.4, pp.649-664, 1992.
DOI : 10.1137/0802032

Z. Harchaoui, A. Juditsky, and A. Nemirovski, Conditional gradient algorithms for norm-regularized smooth convex optimization, Mathematical Programming, vol.82, issue.281, pp.75-112, 2015.
DOI : 10.1090/S0025-5718-2012-02598-1
URL : https://hal.archives-ouvertes.fr/hal-00978368

T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, The entire regularization path for the support vector machine, Journal of Machine Learning Research (JMLR), vol.5, pp.1391-1415, 2004.

E. Hazan, A. Agarwal, and S. Kale, Logarithmic regret algorithms for online convex optimization, Machine Learning, pp.169-192, 2007.
DOI : 10.1007/11776420_37
URL : http://www.cs.princeton.edu/~satyen/papers/HKKA2006.pdf

T. Hazan and T. Jaakkola, Steps toward deep kernel methods from infinite neural networks, 2015.

S. Heber, Splicing graphs and EST assembly problem, Bioinformatics, vol.18, issue.Suppl 1, pp.181-188, 2002.
DOI : 10.1093/bioinformatics/18.suppl_1.S181

G. E. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235
URL : http://www.cs.berkeley.edu/~ywteh/research/ebm/nc2006.pdf

J. Hiriart-urruty and C. Lemaréchal, Convex Analysis and Minimization Algorithms I, 1996.
DOI : 10.1007/978-3-662-02796-7

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

R. R. Hocking, A Biometrics Invited Paper. The Analysis and Selection of Variables in Linear Regression, Biometrics, vol.32, issue.1, pp.1-49, 1976.
DOI : 10.2307/2529336

R. Horst and N. V. Thoai, DC Programming: Overview, Journal of Optimization Theory and Applications, vol.1, issue.1, pp.1-43, 1999.
DOI : 10.1287/moor.1.3.251

J. Huang, Z. Zhang, and D. Metaxas, Learning with structured sparsity, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553429
URL : http://www.cs.mcgill.ca/~icml2009/papers/452.pdf

A. Hyvärinen, J. Hurri, and P. O. Hoyer, Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, 2009.
DOI : 10.1007/978-1-84882-491-1

L. Jacob, G. Obozinski, and J. Vert, Group lasso with overlap and graph lasso, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553431
URL : http://www.cs.mcgill.ca/~icml2009/papers/471.pdf

M. Jaggi, Sparse Convex Optimization Methods for Machine Learning, 2011.

M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, Proceedings of the International Conference on Machine Learning (ICML), 2013.

T. Jebara and A. Choromanska, Majorization for CRFs and latent likelihoods, Advances in Neural Information Processing Systems (NIPS), 2012.

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, Proximal methods for sparse hierarchical dictionary learning, Proceedings of the International Conference on Machine Learning (ICML), 2010.

R. Jenatton, J. Audibert, and F. Bach, Structured variable selection with sparsityinducing norms, Journal of Machine Learning Research (JMLR), vol.12, pp.2777-2824, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00377732

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, Proximal methods for hierarchical sparse coding, Journal of Machine Learning Research (JMLR), vol.12, pp.2297-2334, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00516723

H. Bibliography, W. H. Jiang, and . Wong, Statistical inferences for isoform expression in RNA-Seq, Bioinformatics, vol.25, issue.8, pp.1026-1032, 2009.

A. Juditsky and A. Nemirovski, First order methods for nonsmooth convex large-scale optimization. In Optimization for Machine Learning, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00981863

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Largescale video classification with convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
DOI : 10.1109/cvpr.2014.223
URL : http://www.cs.cmu.edu/~rahuls/pub/cvpr2014-deepvideo-rahuls.pdf

J. E. Kelley and J. , The Cutting-Plane Method for Solving Convex Programs, Journal of the Society for Industrial and Applied Mathematics, vol.8, issue.4, pp.703-712, 1960.
DOI : 10.1137/0108053

E. Khan, B. Marlin, G. Bouchard, and K. Murphy, Variational bounds for mixed-data factor analysis, Advances in Neural Information Processing Systems (NIPS), 2010.

J. Kim, J. K. Lee, and K. M. Lee, Deeply-Recursive Convolutional Network for Image Super-Resolution, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.181

V. Klee and G. J. Minty, How good is the simplex algorithm?, pp.159-175, 1972.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012.
DOI : 10.1162/neco.2009.10-08-881
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

G. Lan, An optimal method for stochastic composite optimization, Mathematical Programming, pp.365-397, 2012.
DOI : 10.1023/A:1021814225969
URL : http://www.optimization-online.org/DB_FILE/2008/08/2061.pdf

G. Lan and Y. Zhou, An optimal randomized incremental gradient method, Mathematical Programming
DOI : 10.1007/s10107-014-0839-0

K. Lange, D. R. Hunter, and I. Yang, Optimization transfer using surrogate objective functions, Journal of computational and graphical statistics, vol.9, issue.1, pp.1-20, 2000.
DOI : 10.2307/1390605

J. Langford, L. Li, and T. Zhang, Sparse online learning via truncated gradient, Journal of Machine Learning Research (JMLR), vol.10, pp.777-801, 2009.

Q. Le, T. Sarlós, and A. Smola, Fastfood?approximating kernel expansions in loglinear time, Proceedings of the International Conference on Machine Learning (ICML), 2013.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.
DOI : 10.1109/5.726791
URL : http://www.cs.berkeley.edu/~daf/appsem/Handwriting/papers/00726791.pdf

Y. Lecun, L. Bottou, G. B. Orr, and K. Müller, Efficient backprop, Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science LNCS 1524, 1998.

C. Lee, S. Xie, P. W. Gallagher, Z. Zhang, and Z. Tu, Deeply-supervised nets, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.

C. Lee, P. W. Gallagher, and Z. Tu, Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
DOI : 10.1109/tpami.2017.2703082

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems (NIPS), 2001.

J. Lee, Y. Sun, and M. Saunders, Proximal Newton-type methods for convex optimization, Advances in Neural Information Processing Systems (NIPS), pp.836-844, 2012.

C. Lemaréchal and C. Sagastizábal, Practical Aspects of the Moreau--Yosida Regularization: Theoretical Preliminaries, SIAM Journal on Optimization, vol.7, issue.2, pp.367-385, 1997.
DOI : 10.1137/S1052623494267127

C. S. Leslie, E. Eskin, and W. S. Noble, THE SPECTRUM KERNEL: A STRING KERNEL FOR SVM PROTEIN CLASSIFICATION, Biocomputing 2002, pp.566-575, 2002.
DOI : 10.1142/9789812799623_0053

W. Li, IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly, Journal of Computational Biology, vol.18, issue.11, pp.1693-1707, 2011.
DOI : 10.1089/cmb.2011.0171
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3216102/pdf

H. Lin, J. Mairal, and Z. Harchaoui, A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160728

H. Lin, J. Mairal, and Z. Harchaoui, Quickening: A generic Quasi-Newton algorithm for faster gradient-based optimization, 2017.

M. Lin, Q. Chen, and S. Yan, Network in network, Proceedings of the International Conference on Learning Representations (ICLR), 2013.
URL : https://hal.archives-ouvertes.fr/hal-01551350

D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical Programming, vol.32, issue.2, pp.503-528, 1989.
DOI : 10.1007/BF01589116

R. Livni, S. Shalev-shwartz, and O. Shamir, On the computational efficiency of training neural networks, Advances in Neural Information Processing Systems (NIPS), 2014.

J. Mairal and B. Yu, Complexity analysis of the Lasso regularization path, Proceedings of the International Conference on Machine Learning (ICML), 2012.

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Discriminative learned dictionaries for local image analysis, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587652
URL : http://mplab.ucsd.edu/wp-content/uploads/cvpr2008/conference/data/papers/312.pdf

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Supervised dictionary learning, Advances in Neural Information Processing Systems (NIPS), 2008.
URL : https://hal.archives-ouvertes.fr/inria-00322431

J. Mairal, M. Elad, and G. Sapiro, Sparse Representation for Color Image Restoration, IEEE Transactions on Image Processing, vol.17, issue.1, pp.53-69, 2008.
DOI : 10.1109/TIP.2007.911828
URL : http://www.ima.umn.edu/preprints/oct2006/2139.pdf

J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce, Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation, Proceedings of the European Conference on Computer Vision (ECCV), 2008.
DOI : 10.1109/ICCV.2005.171

J. Mairal, G. Sapiro, and M. Elad, Learning Multiscale Sparse Representations for Image and Video Restoration, Multiscale Modeling & Simulation, vol.7, issue.1, pp.214-241, 2008.
DOI : 10.1137/070697653
URL : http://www.di.ens.fr/~mairal/resources/pdf/KSVDMultiScale.pdf

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online dictionary learning for sparse coding, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553463
URL : http://www.ima.umn.edu/preprints/apr2009/2249.pdf

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Non-local sparse models for image restoration, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459452

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, Journal of Machine Learning Research (JMLR), vol.11, pp.19-60, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00408716

J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, Network flow algorithms for structured sparsity, Advances in Neural Information Processing Systems (NIPS), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00512556

J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, Convex and network flow optimization for structured sparsity, Journal of Machine Learning Research (JMLR), vol.12, pp.2681-2720, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00584817

J. Mairal, F. Bach, and J. Ponce, Task-Driven Dictionary Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.4, pp.791-804, 2012.
DOI : 10.1109/TPAMI.2011.156
URL : https://hal.archives-ouvertes.fr/inria-00521534

J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing. Foundations and Trends in Computer Vision and Graphics, 2014.
DOI : 10.1561/0600000058
URL : https://hal.archives-ouvertes.fr/hal-01081139

J. Mairal and B. Yu, Supervised feature selection in graphs with path coding penalties and network flows, Journal of Machine Learning Research, vol.14, issue.1, pp.2449-2485, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00806372

J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, pp.2-385, 2014.
DOI : 10.1561/0600000058
URL : https://hal.archives-ouvertes.fr/hal-01081139

S. G. Mallat, A theory for multiresolution signal decomposition: the wavelet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.11, issue.7, pp.674-693, 1989.
DOI : 10.1109/34.192463

C. L. Mallows, Choosing variables in a linear regression: A graphical aid, 1964.

C. L. Mallows, Choosing a subset regression. unpublished paper presented at the Joint Statistical Meeting, 1966.

H. Markowitz, Portfolio selection, Journal of Finance, vol.7, issue.1, pp.77-91, 1952.

J. A. Mazer, W. Vinje, J. Mcdermott, P. Schiller, and J. Gallant, Spatial frequency and orientation tuning dynamics in area V1, Proceedings of the National Academy of Sciences USA, pp.1645-1650, 2002.
DOI : 10.1017/S095252380017107X
URL : http://www.pnas.org/content/99/3/1645.full.pdf

N. Meinshausen and P. Bühlmann, Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.7, issue.4, pp.417-473, 2010.
DOI : 10.1186/1471-2105-9-307
URL : http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2010.00740.x/pdf

A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux, Dictionary learning for massive matrix factorization, Proceedings of the International Conference on Machine Learning (ICML), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01308934

A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux, Stochastic Subsampling for Factorizing Huge Matrices, IEEE Transactions on Signal Processing, 2017.
DOI : 10.1109/TSP.2017.2752697
URL : https://hal.archives-ouvertes.fr/hal-01431618

R. Mifflin, A quasi-second-order proximal bundle algorithm, Mathematical Programming, pp.51-72, 1996.
DOI : 10.1007/978-3-642-82450-0_12

G. Montavon, M. L. Braun, and K. Müller, Kernel analysis of deep networks, Journal of Machine Learning Research (JMLR), vol.12, pp.2563-2581, 2011.

R. M. Neal, Bayesian Learning for Neural Networks, 1994.
DOI : 10.1007/978-1-4612-0745-0

R. M. Neal and G. E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 1998.

A. Nelakanti, C. Archambeau, J. Mairal, F. Bach, and G. Bouchard, Structured penalties for log-linear language models, Empirical Methods in Natural Language Processing (EMNLP), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00904820

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277
URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, A method of solving a convex programming problem with convergence rate, Soviet Mathematics Doklady, vol.27, issue.1 22, pp.372-376, 1983.

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, 2004.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov, Gradient methods for minimizing composite functions, Mathematical Programming, pp.125-161, 2013.
DOI : 10.1109/TIT.2005.864420

Y. Nesterov and B. Polyak, Cubic regularization of Newton method and its global performance, Mathematical Programming, vol.99, issue.1, pp.177-205, 2006.
DOI : 10.1007/s10107-006-0706-8

Y. Nesterov and J. Vial, Confidence level solutions for stochastic programming, Automatica, vol.44, issue.6, pp.1559-1568, 2008.
DOI : 10.1016/j.automatica.2008.01.017
URL : http://ecolu-info.unige.ch/~logilab/reports/GradStoc.ps

Y. Nesterov, Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems, SIAM Journal on Optimization, vol.22, issue.2, pp.341-362, 2012.
DOI : 10.1137/100802001

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu et al., Reading digits in natural images with unsupervised feature learning, NIPS workshop on deep learning and unsupervised feature learning, 2011.

S. Nishimoto, A. T. Vu, T. Naselaris, Y. Benjamini, B. Yu et al., Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies, Current Biology, vol.21, issue.19, pp.1641-1646, 2011.
DOI : 10.1016/j.cub.2011.08.031
URL : https://doi.org/10.1016/j.cub.2011.08.031

B. A. Olshausen and D. J. Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, vol.381, issue.6583, pp.607-609, 1996.
DOI : 10.1038/381607a0

M. R. Osborne, B. Presnell, and B. A. Turlach, A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, vol.20, issue.3, pp.389-403, 2000.
DOI : 10.1093/imanum/20.3.389

P. Paatero and U. Tapper, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, vol.18, issue.2, pp.111-126, 1994.
DOI : 10.1007/978-3-642-93295-3_112

Q. Pan, O. Shai, L. J. Lee, B. J. Frey, and B. J. Blencowe, Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing, Nature Genetics, vol.76, issue.12, pp.401413-1415, 2008.
DOI : 10.1016/j.molcel.2004.12.004

C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui, 4wd-catalyst acceleration for gradient-based non-convex optimization, 2017.

M. Paulin, M. Douze, Z. Harchaoui, J. Mairal, F. Perronin et al., Local Convolutional Features with Unsupervised Training for Image Retrieval, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.19
URL : https://hal.archives-ouvertes.fr/hal-01207966

M. Paulin, J. Mairal, M. Douze, Z. Harchaoui, F. Perronnin et al., Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach, International Journal of Computer Vision, vol.34, issue.3, p.2016
DOI : 10.1109/CVPR.2015.7298767
URL : https://hal.archives-ouvertes.fr/hal-01277109

K. Popper, Logik der Forschung Zur Erkenntnistheorie der modernen Naturwissenschaft. Payot, 1934. translated in French under the title " La logique de la découverte scientifique, 1973.

F. Radenovi?, G. Tolias, and O. Chum, CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples, Proceedings of the European Conference on Computer Vision (ECCV), 2016.
DOI : 10.1007/978-3-540-88682-2_24

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems (NIPS), 2007.

M. Razaviyayn, M. Sanjabi, and Z. Luo, A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks, Mathematical Programming, pp.515-545, 2016.
DOI : 10.1007/978-1-4612-1394-9

P. Richtárik and M. Taká?, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming, pp.1-38, 2014.
DOI : 10.1111/j.1467-9868.2005.00503.x

K. Ritter, Ein Verfahren zur L??sung parameterabh??ngiger, nichtlinearer Maximum-Probleme, Unternehmensforschung Operations Research - Recherche Op??rationnelle, vol.27, issue.4, pp.149-166, 1962.
DOI : 10.1007/BF01920852

H. Robbins and S. Monro, A stochastic approximation method. The Annals of Mathematical Statistics, pp.400-407, 1951.

R. T. Rockafellar, Monotone Operators and the Proximal Point Algorithm, SIAM Journal on Control and Optimization, vol.14, issue.5, pp.877-898, 1976.
DOI : 10.1137/0314056
URL : http://www.math.washington.edu/~rtr/papers/rtr-MonoOpProxPoint.pdf

A. W. Roe, L. Chelazzi, C. E. Connor, B. R. Conway, I. Fujita et al., Toward a Unified Theory of Visual Area V4, Neuron, vol.74, issue.1, pp.12-29, 2012.
DOI : 10.1016/j.neuron.2012.03.011

S. Rosset and J. Zhu, Piecewise linear regularized solution paths, The Annals of Statistics, vol.35, issue.3, pp.1012-1030, 2007.
DOI : 10.1214/009053606000001370
URL : http://doi.org/10.1214/009053606000001370

S. Salzo and S. Villa, Inexact and accelerated proximal point algorithms, Journal of Convex Analysis, vol.19, issue.4, pp.1167-1192, 2012.

K. Scheinberg and X. Tang, Practical inexact proximal quasi-Newton method with global complexity analysis, Mathematical Programming, pp.1-35, 2014.
DOI : 10.1109/TSP.2009.2016892
URL : http://arxiv.org/pdf/1311.6547

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, 2016.
DOI : 10.1007/s10107-016-1030-6
URL : https://hal.archives-ouvertes.fr/hal-00860051

B. Schölkopf, Support Vector Learning, 1997.

B. Schölkopf and A. J. Smola, Learning with kernels: support vector machines, regularization , optimization, and beyond, 2002.

B. Schölkopf, A. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation, vol.20, issue.5, pp.1299-1319, 1998.
DOI : 10.1007/BF02281970

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss, Journal of Machine Learning Research, vol.14, pp.567-599, 2013.

S. Shalev-shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Mathematical Programming, pp.105-145, 2016.
DOI : 10.1023/A:1012498226479
URL : http://arxiv.org/pdf/1309.2375

S. Shalev-shwartz, SDCA without Duality, Regularization, and Individual Convexity, Proceedings of the International Conference on Machine Learning (ICML), 2016.

S. Shalev-shwartz, Y. Singer, N. Srebro, and A. Cotter, Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.3-30, 2011.
DOI : 10.1145/1273496.1273598

J. Shawe-taylor and N. Cristianini, An introduction to support vector machines and other kernel-based learning methods, 2004.

P. Simard, B. Victorri, Y. Lecun, and J. Denker, Tangent prop?a formalism for specifying selected invariances in an adaptive network, Advances in Neural Information Processing Systems (NIPS), 1992.

P. Y. Simard, Y. A. Lecun, J. S. Denker, and B. Victorri, Transformation invariance in pattern recognition: Tangent distance and propagation, Neural Networks: Tricks of the Trade, number 1524 in Lecture Notes in Computer Science, pp.239-274, 1998.
DOI : 10.1007/978-3-642-88163-3
URL : https://hal.archives-ouvertes.fr/halshs-00009505

E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, Shiftable multiscale transforms, IEEE Transactions on Information Theory, vol.38, issue.2, pp.587-607, 1992.
DOI : 10.1109/18.119725
URL : http://www.cns.nyu.edu/pub/eero/simoncelli91.ps.gz

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems (NIPS), 2014.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR), 2015.

L. Smale, J. Rosasco, A. Bouvrie, T. Caponnetto, and . Poggio, Mathematics of the Neural Response, Foundations of Computational Mathematics, vol.15, issue.7, pp.67-91, 2010.
DOI : 10.1017/CBO9780511809682

A. J. Smola and B. Schölkopf, Sparse greedy matrix approximation for machine learning, Proceedings of the International Conference on Machine Learning (ICML), 2000.

P. Smolensky, Parallel distributed processing: explorations in the microstructure of cognition chapter information processing in dynamical systems: foundations of harmony theory, pp.194-281, 1986.

J. Snoek, H. Larochelle, and R. P. Adams, Practical bayesian optimization of machine learning algorithms, Advances in Neural Information Processing Systems (NIPS), 2012.

S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, Large scale multiple kernel learning, Journal of Machine Learning Research (JMLR), vol.7, pp.1531-1565, 2006.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research (JMLR), vol.15, pp.1929-1958, 2014.

I. Steinwart, P. Thomann, and N. Schmid, Learning with hierarchical gaussian kernels, 2016.

V. Sydorov, M. Sakurada, and C. Lampert, Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.182

M. Szafranski, Y. Grandvalet, and P. Morizet-mahoudeaux, Hierarchical penalization, Advances in Neural Information Processing Systems (NIPS), 2007.
URL : https://hal.archives-ouvertes.fr/hal-00267338

M. Szafranski, Y. Grandvalet, and A. Rakotomamonjy, Composite kernel learning, Machine Learning, vol.37, issue.6A, pp.73-103, 2010.
DOI : 10.1007/978-1-4757-2440-0
URL : https://hal.archives-ouvertes.fr/hal-00316016

R. Tibshirani, Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society: Series B, vol.58, issue.1, pp.267-288, 1996.
DOI : 10.1111/j.1467-9868.2011.00771.x

A. M. Tillmann, Y. C. Eldar, and J. Mairal, DOLPHIn???Dictionary Learning for Phase Retrieval, IEEE Transactions on Signal Processing, vol.64, issue.24, pp.6485-6500, 2016.
DOI : 10.1109/TSP.2016.2607180
URL : https://hal.archives-ouvertes.fr/hal-01387428

R. Timofte, V. Smet, and L. Van-gool, Anchored Neighborhood Regression for Fast Example-Based Super-Resolution, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.241

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.510
URL : http://arxiv.org/pdf/1412.0767

P. Tseng, Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization, Journal of Optimization Theory and Applications, vol.109, issue.3, pp.475-494, 2001.
DOI : 10.1023/A:1017501703105

B. A. Turlach, W. N. Venables, and S. J. Wright, Simultaneous Variable Selection, Technometrics, vol.47, issue.3, pp.349-363, 2005.
DOI : 10.1198/004017005000000139

V. Vapnik, The nature of statistical learning theory, 2000.

G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, and B. Thirion, Multi-subject Dictionary Learning to Segment an Atlas of Brain Spontaneous Activity, Biennial International Conference on Information Processing in Medical Imaging, 2011.
DOI : 10.1007/978-3-642-22092-0_46
URL : https://hal.archives-ouvertes.fr/inria-00588898

A. Vedaldi and A. Zisserman, Efficient Additive Kernels via Explicit Feature Maps, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, pp.480-492, 2012.
DOI : 10.1109/TPAMI.2011.153
URL : http://eprints.pascal-network.org/archive/00006964/01/vedaldi10.pdf

S. Wager, W. Fithian, S. Wang, and P. Liang, Altitude Training: Strong Bounds for Single-layer Dropout, Advances in Neural Information Processing Systems (NIPS), 2014.

M. J. Wainwright and M. I. Jordan, Graphical Models, Exponential Families, and Variational Inference, Foundations and Trends?? in Machine Learning, vol.1, issue.1???2, pp.1-305, 2008.
DOI : 10.1561/2200000001
URL : http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf

L. Wan, M. Zeiler, S. Zhang, Y. Lecun, and R. Fergus, Regularization of neural networks using dropconnect, Proceedings of the International Conference on Machine Learning (ICML), 2013.

L. Wang, Y. Qiao, and X. Tang, Action recognition with trajectory-pooled deepconvolutional descriptors, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/cvpr.2015.7299059
URL : http://arxiv.org/abs/1505.04868

Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, Deep networks for image superresolution with sparse prior, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/iccv.2015.50
URL : http://arxiv.org/pdf/1507.08905

B. Widrow and M. E. Hoff, Adaptive switching circuits, RE WESCON Convention Record, pp.96-104, 1960.
DOI : 10.21236/AD0241531

C. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems (NIPS), 2001.

R. J. Williams and D. Zipser, A Learning Algorithm for Continually Running Fully Recurrent Neural Networks, Neural Computation, vol.11, issue.2, pp.270-280, 1989.
DOI : 10.1016/0885-064X(88)90021-0
URL : ftp://ftp.ccs.neu.edu/pub/people/rjw/rtrl-nc-89.ps

B. D. Willmore, R. J. Prenger, and J. L. Gallant, Neural Representation of Natural Images in Visual Area V2, Journal of Neuroscience, vol.30, issue.6, pp.2102-2114, 2010.
DOI : 10.1523/JNEUROSCI.4099-09.2010

S. J. Wright, R. D. Nowak, and M. A. Figueiredo, Sparse Reconstruction by Separable Approximation, IEEE Transactions on Signal Processing, vol.57, issue.7, pp.2479-2493, 2009.
DOI : 10.1109/TSP.2009.2016892
URL : http://www.cs.wisc.edu/~swright/papers/Wright_Nowak_Figueiredo_2007_submitted.pdf

Z. Xia, NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq, BMC Bioinformatics, vol.12, issue.1, p.162, 2011.
DOI : 10.1214/009053604000000067
URL : https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-12-162?site=bmcbioinformatics.biomedcentral.com

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research (JMLR), vol.11, pp.2543-2596, 2010.

L. Xiao and T. Zhang, A Proximal Stochastic Gradient Method with Progressive Variance Reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.
DOI : 10.1137/140961791
URL : http://arxiv.org/pdf/1403.4699

J. Yang, K. Yu, Y. Gong, and T. Huang, Linear spatial pyramid matching using sparse coding for image classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.

F. Yger, M. Berar, G. Gasso, and A. Rakotomamonjy, A supervised strategy for deep kernel machine, Proceedings of ESANN, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00668302

J. Yu, S. Vishwanathan, S. Günter, and N. N. Schraudolph, A quasi-Newton approach to non-smooth convex optimization, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390309
URL : http://icml2008.cs.helsinki.fi/papers/461.pdf

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006.
DOI : 10.1198/016214502753479356
URL : http://www2.isye.gatech.edu/~myuan/papers/glasso.final.pdf

M. D. Zeiler and R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2013.

M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, Proceedings of the European Conference on Computer Vision (ECCV), 2014.
DOI : 10.1007/978-3-319-10590-1_53
URL : http://cs.nyu.edu/%7Efergus/papers/zeilerECCV2014.pdf

R. Zeyde, M. Elad, and M. Protter, On Single Image Scale-Up Using Sparse-Representations, Curves and Surfaces, pp.711-730, 2010.
DOI : 10.1109/ICCV.2009.5459271

K. Zhang and J. T. Kwok, Clustered Nystr??m Method for Large Scale Manifold Learning and Dimension Reduction, IEEE Transactions on Neural Networks, vol.21, issue.10, pp.1576-1587, 2010.
DOI : 10.1109/TNN.2010.2064786

Y. Zhang, P. Liang, and M. J. Wainwright, Convexified convolutional neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2016.

P. Zhao, G. Rocha, and B. Yu, The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, vol.37, issue.6A, pp.3468-3497, 2009.
DOI : 10.1214/07-AOS584
URL : http://doi.org/10.1214/07-aos584

H. Zou, T. Hastie, and R. Tibshirani, Sparse Principal Component Analysis, Journal of Computational and Graphical Statistics, vol.15, issue.2, pp.265-286, 2006.
DOI : 10.1198/106186006X113430