A lower bound for the optimization of finite sums, Proceedings of the International Conference on Machine Learning (ICML), 2015. ,
<tex>$rm K$</tex>-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation, IEEE Transactions on Signal Processing, vol.54, issue.11, pp.4311-4322, 2006. ,
DOI : 10.1109/TSP.2006.881199
Convergent incremental optimization transfer algorithms: Application to tomography, IEEE Transactions on Medical Imaging, vol.25, issue.3, pp.283-296, 2006. ,
Network Flows, 1993. ,
DOI : 10.21236/ADA594171
Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory, pp.267-281, 1973. ,
Deep convolutional networks are hierarchical kernel machines, 2015. ,
Bolasso, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390161
URL : https://hal.archives-ouvertes.fr/hal-00271289
Optimization with Sparsity-Inducing Penalties, Machine Learning, pp.1-106, 2012. ,
DOI : 10.1561/2200000015
URL : https://hal.archives-ouvertes.fr/hal-00613125
Exploring large feature spaces with hierarchical multiple kernel learning, Advances in Neural Information Processing Systems (NIPS), 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00319660
Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research (JMLR), vol.18, issue.19, pp.1-53, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01098505
Predictive low-rank decomposition for kernel methods, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005. ,
DOI : 10.1145/1102351.1102356
URL : http://www.cs.berkeley.edu/~jordan/papers/bach-jordan-icml05.pdf
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009. ,
DOI : 10.1137/080716542
URL : http://ie.technion.ac.il/%7Ebecka/papers/finalicassp2009.pdf
On the Convergence of Block Coordinate Descent Type Methods, SIAM Journal on Optimization, vol.23, issue.4, pp.2037-2060, 2013. ,
DOI : 10.1137/120887679
The ???independent components??? of natural scenes are edge filters, Vision Research, vol.37, issue.23, pp.3327-3338, 1997. ,
DOI : 10.1016/S0042-6989(97)00121-1
Efficient RNA isoform identification and quantification from RNA-Seq data with network flows, Bioinformatics, vol.30, issue.17, pp.30-2447, 2014. ,
DOI : 10.1093/bioinformatics/btu317
URL : https://hal.archives-ouvertes.fr/hal-00803134
A convex formulation for joint RNA isoform detection and quantification from multiple RNA-seq samples, BMC Bioinformatics, vol.31, issue.1, p.262, 2015. ,
DOI : 10.1038/nbt.2450
URL : https://hal.archives-ouvertes.fr/hal-01123141
Network Optimization: Continuous and Discrete Models, Athena Scientific, 1998. ,
Nonlinear Programming, Athena Scientific, 1999. ,
Group invariance and stability to deformations of deep convolutional representations, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01536004
Stochastic optimization with variance reduction for infinite datasets with finite-sum structure, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01375816
A Convergent Incremental Gradient Method with a Constant Step Size, SIAM Journal on Optimization, vol.18, issue.1, pp.29-51, 2007. ,
DOI : 10.1137/040615961
URL : http://www.eecs.umich.edu/~hero/Preprints/AveragedGradientVer5.pdf
Object recognition with hierarchical kernel descriptors, CVPR 2011, 2011. ,
DOI : 10.1109/CVPR.2011.5995719
URL : http://www.cs.washington.edu/homes/lfb/paper/cvpr11.pdf
Monotonicity of quadratic-approximation algorithms, Annals of the Institute of Statistical Mathematics, vol.11, issue.4, pp.641-663, 1988. ,
DOI : 10.1007/BF00049423
Shortest-Path Kernels on Graphs, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005. ,
DOI : 10.1109/ICDM.2005.132
URL : http://cbio.ensmp.fr/~jvert/svn/bibli/local/Borgwardt2005Shortest-Path.pdf
Convex analysis and nonlinear optimization: Theory and examples, 2006. ,
Online algorithms and stochastic approximations, Online Learning and Neural Networks, 1998. ,
Optimization methods for large-scale machine learning ,
The tradeoffs of large scale learning, Advances in Neural Information Processing Systems (NIPS), 2008. ,
Efficient approximate energy minimization via graph cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.20, issue.12, pp.1222-1239, 2001. ,
DOI : 10.1109/34.969114
URL : http://www.csd.uwo.ca/~yuri/Papers/iccv99.pdf
Radial basis functions, multi-variable functional interpolation and adaptive networks, 1988. ,
Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013. ,
DOI : 10.1109/TPAMI.2012.230
URL : http://arxiv.org/pdf/1203.1513
A Review of Image Denoising Algorithms, with a New One, Multiscale Modeling & Simulation, vol.4, issue.2, pp.490-530, 2005. ,
DOI : 10.1137/040616024
URL : https://hal.archives-ouvertes.fr/hal-00271141
An inexact successive quadratic approximation method for L-1 regularized optimization, Mathematical Programming, pp.375-396, 2015. ,
DOI : 10.1198/tech.2006.s352
Enhancing Sparsity by Reweighted ??? 1 Minimization, Journal of Fourier Analysis and Applications, vol.7, issue.3, pp.877-905, 2008. ,
DOI : 10.1007/978-1-4757-4182-7
On-line expectation-maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.11, issue.3, pp.593-613, 2009. ,
DOI : 10.1007/978-1-4684-0192-9
High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics, Journal of the American Statistical Association, vol.103, issue.484, pp.1438-1456, 2008. ,
DOI : 10.1198/016214508000000869
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3017385/pdf
Atomic Decomposition by Basis Pursuit, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.33-61, 1999. ,
DOI : 10.1137/S1064827596304010
URL : http://www-stat.stanford.edu/~donoho/Reports/1995/30401.pdf
Proximal quasi-Newton methods for nondifferentiable convex optimization, Mathematical Programming, vol.85, issue.2, pp.313-334, 1999. ,
DOI : 10.1007/s101070050059
URL : http://halo.kuamp.kyoto-u.ac.jp/zagato/member/staff/fuku/./papers/proxNewton.ps.Z
Large-Margin Classification in Infinite Neural Networks, Neural Computation, vol.10, issue.10, pp.2678-2697, 2010. ,
DOI : 10.1109/TIT.2002.808136
URL : http://www.cse.ucsd.edu/users/yoc002/paper/neco_arccos.pdf
The loss surface of multilayer networks, International Conference on Artificial Intelligence and Statistics (AISTATS), 2015. ,
Multi-column deep neural networks for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. ,
DOI : 10.1109/CVPR.2012.6248110
URL : http://www.idsia.ch/idsiareport/IDSIA-04-12.pdf
ROBUST MODELING WITH ERRATIC DATA, GEOPHYSICS, vol.38, issue.5, pp.826-844, 1973. ,
DOI : 10.1190/1.1440378
Logistic regression, AdaBoost and Bregman distances, Machine Learning, pp.253-285, 2002. ,
The context-tree kernel for strings, Neural Networks, vol.18, issue.8, pp.1111-1123, 2005. ,
DOI : 10.1016/j.neunet.2005.07.010
URL : https://hal.archives-ouvertes.fr/hal-00433583
A Kernel for Time Series Based on Global Alignments, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, 2007. ,
DOI : 10.1109/ICASSP.2007.366260
Deep Gaussian processes, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2013. ,
Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, Advances in Neural Information Processing Systems (NIPS), 2016. ,
Maximization of a linear function of variables subject to linear inequalities ,
Orthonormal bases of compactly supported wavelets, Communications on Pure and Applied Mathematics, vol.34, issue.7, pp.909-996, 1988. ,
DOI : 10.1007/978-3-642-61987-8
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics, vol.58, issue.11, pp.1413-1457, 2004. ,
DOI : 10.1002/0471221317
URL : http://onlinelibrary.wiley.com/doi/10.1002/cpa.20042/pdf
Predicting neuronal responses during natural vision, Network: Computation in Neural Systems, vol.19, issue.4, pp.239-260, 2005. ,
DOI : 10.1016/0042-6989(93)90248-U
URL : http://www.ece.umd.edu/~svd/pdf/david_gallant_prediction_2005.pdf
Finito: A faster, permutable incremental gradient method for big data problems, Proceedings of the International Conference on Machine Learning (ICML), 2014. ,
Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
Duality and auxiliary functions for Bregman distances, 2001. ,
Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B, vol.39, issue.1, pp.1-38, 1977. ,
First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol.110, issue.3, pp.37-75, 2014. ,
DOI : 10.1007/978-3-642-82118-9
URL : http://www.ecore.be/DPs/dp_1297333979.pdf
Learning a Deep Convolutional Network for Image Super-Resolution, Proceedings of the European Conference on Computer Vision (ECCV), 2014. ,
DOI : 10.1007/978-3-319-10593-2_13
URL : http://www.eecs.qmul.ac.uk/~ccloy/files/eccv_2014_deepresolution.pdf
Image Super-Resolution Using Deep Convolutional Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.2, pp.295-307, 2016. ,
DOI : 10.1109/TPAMI.2015.2439281
URL : http://arxiv.org/pdf/1501.00092
Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research (JMLR), vol.10, pp.2899-2934, 2009. ,
Least angle regression, Annals of Statistics, vol.32, issue.2, pp.407-499, 2004. ,
Multiple regression analysis Mathematical methods for digital computers, pp.191-203, 1960. ,
Sparse and Redundant Representations, 2010. ,
DOI : 10.1007/978-1-4419-7011-4
URL : https://hal.archives-ouvertes.fr/inria-00568893
Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries, IEEE Transactions on Image Processing, vol.15, issue.12, pp.3736-3745, 2006. ,
DOI : 10.1109/TIP.2006.881969
URL : http://www.cs.technion.ac.il/~elad/publications/journals/2005/KSVD_Denoising_IEEE_TIP.pdf
Ordered subsets algorithms for transmission tomography, Physics in Medicine and Biology, vol.44, issue.11, pp.2835-2851, 1999. ,
DOI : 10.1088/0031-9155/44/11/311
An EM algorithm for wavelet-based image restoration, IEEE Transactions on Image Processing, vol.12, issue.8, pp.906-916, 2003. ,
DOI : 10.1109/TIP.2003.814255
URL : http://cmc.rice.edu/docs/docs/Fig2002May1ImageResto.ps
Efficient svm training using low-rank kernel representations, Journal of Machine Learning Research (JMLR), vol.2, pp.243-264, 2001. ,
Maximal flow through a network, Journal canadien de math??matiques, vol.8, issue.0, pp.399-404, 1956. ,
DOI : 10.4153/CJM-1956-045-5
Hybrid Deterministic-Stochastic Methods for Data Fitting, SIAM Journal on Scientific Computing, vol.34, issue.3, pp.1380-1405, 2012. ,
DOI : 10.1137/110830629
URL : https://hal.archives-ouvertes.fr/inria-00626571
The elements of statistical learning, 2001. ,
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.40, issue.4, pp.62-658, 1979. ,
DOI : 10.1007/BF00344251
A Globally and Superlinearly Convergent Algorithm for Nonsmooth Convex Minimization, SIAM Journal on Optimization, vol.6, issue.4, pp.1106-1120, 1996. ,
DOI : 10.1137/S1052623494278839
An exponential lower bound on the complexity of regularization paths, Journal of Computational Geometry (JoCG), vol.3, issue.1, pp.168-195, 2012. ,
Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming, IEEE Transactions on Signal Processing, vol.57, issue.12, pp.4686-4698, 2009. ,
DOI : 10.1109/TSP.2009.2026004
URL : https://hal.archives-ouvertes.fr/hal-00439453
Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming, SIAM Journal on Optimization, vol.23, issue.4, 2013. ,
DOI : 10.1137/120880811
URL : http://arxiv.org/pdf/1309.5549
Approximating parameterized convex optimization problems, Algorithms -ESA, Lectures Notes Comp. Sci, 2010. ,
DOI : 10.1145/2390176.2390186
URL : http://www.m8j.net/math/approxPaths.pdf
An Efficient Implementation of a Scaling Minimum-Cost Flow Algorithm, Journal of Algorithms, vol.22, issue.1, pp.1-29, 1997. ,
DOI : 10.1006/jagm.1995.0805
A new approach to the maximum flow problem, Proc. of ACM Symposium on Theory of Computing, pp.136-146, 1986. ,
Maxout networks, Proceedings of the International Conference on Machine Learning (ICML), 2013. ,
Deep Image Retrieval: Learning Global Representations for Image Search, Proceedings of the European Conference on Computer Vision (ECCV), 2016. ,
DOI : 10.1109/CVPR.2014.180
URL : http://arxiv.org/pdf/1604.01325
Stochastic block BFGS: Squeezing more curvature out of data, Proceedings of the International Conference on Machine Learning (ICML), 2016. ,
Outcomes of the equivalence of adaptive ridge with least absolute shrinkage, Advances in Neural Information Processing Systems (NIPS), 1999. ,
CVX: Matlab software for disciplined convex programming, version 2.1, 2014. ,
DOI : 10.1007/0-387-30528-9_7
URL : http://www.stanford.edu/~boyd/papers/pdf/disc_cvx_prog.pdf
New Proximal Point Algorithms for Convex Minimization, SIAM Journal on Optimization, vol.2, issue.4, pp.649-664, 1992. ,
DOI : 10.1137/0802032
Conditional gradient algorithms for norm-regularized smooth convex optimization, Mathematical Programming, vol.82, issue.281, pp.75-112, 2015. ,
DOI : 10.1090/S0025-5718-2012-02598-1
URL : https://hal.archives-ouvertes.fr/hal-00978368
The entire regularization path for the support vector machine, Journal of Machine Learning Research (JMLR), vol.5, pp.1391-1415, 2004. ,
Logarithmic regret algorithms for online convex optimization, Machine Learning, pp.169-192, 2007. ,
DOI : 10.1007/11776420_37
URL : http://www.cs.princeton.edu/~satyen/papers/HKKA2006.pdf
Steps toward deep kernel methods from infinite neural networks, 2015. ,
Splicing graphs and EST assembly problem, Bioinformatics, vol.18, issue.Suppl 1, pp.181-188, 2002. ,
DOI : 10.1093/bioinformatics/18.suppl_1.S181
A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006. ,
DOI : 10.1162/jmlr.2003.4.7-8.1235
URL : http://www.cs.berkeley.edu/~ywteh/research/ebm/nc2006.pdf
Convex Analysis and Minimization Algorithms I, 1996. ,
DOI : 10.1007/978-3-662-02796-7
Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997. ,
DOI : 10.1016/0893-6080(88)90007-X
A Biometrics Invited Paper. The Analysis and Selection of Variables in Linear Regression, Biometrics, vol.32, issue.1, pp.1-49, 1976. ,
DOI : 10.2307/2529336
DC Programming: Overview, Journal of Optimization Theory and Applications, vol.1, issue.1, pp.1-43, 1999. ,
DOI : 10.1287/moor.1.3.251
Learning with structured sparsity, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553429
URL : http://www.cs.mcgill.ca/~icml2009/papers/452.pdf
Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, 2009. ,
DOI : 10.1007/978-1-84882-491-1
Group lasso with overlap and graph lasso, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553431
URL : http://www.cs.mcgill.ca/~icml2009/papers/471.pdf
Sparse Convex Optimization Methods for Machine Learning, 2011. ,
Revisiting Frank-Wolfe: Projection-free sparse convex optimization, Proceedings of the International Conference on Machine Learning (ICML), 2013. ,
Majorization for CRFs and latent likelihoods, Advances in Neural Information Processing Systems (NIPS), 2012. ,
Proximal methods for sparse hierarchical dictionary learning, Proceedings of the International Conference on Machine Learning (ICML), 2010. ,
Structured variable selection with sparsityinducing norms, Journal of Machine Learning Research (JMLR), vol.12, pp.2777-2824, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00377732
Proximal methods for hierarchical sparse coding, Journal of Machine Learning Research (JMLR), vol.12, pp.2297-2334, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00516723
Statistical inferences for isoform expression in RNA-Seq, Bioinformatics, vol.25, issue.8, pp.1026-1032, 2009. ,
First order methods for nonsmooth convex large-scale optimization. In Optimization for Machine Learning, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00981863
Largescale video classification with convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. ,
DOI : 10.1109/cvpr.2014.223
URL : http://www.cs.cmu.edu/~rahuls/pub/cvpr2014-deepvideo-rahuls.pdf
The Cutting-Plane Method for Solving Convex Programs, Journal of the Society for Industrial and Applied Mathematics, vol.8, issue.4, pp.703-712, 1960. ,
DOI : 10.1137/0108053
Variational bounds for mixed-data factor analysis, Advances in Neural Information Processing Systems (NIPS), 2010. ,
Deeply-Recursive Convolutional Network for Image Super-Resolution, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
DOI : 10.1109/CVPR.2016.181
How good is the simplex algorithm?, pp.159-175, 1972. ,
ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012. ,
DOI : 10.1162/neco.2009.10-08-881
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf
An optimal method for stochastic composite optimization, Mathematical Programming, pp.365-397, 2012. ,
DOI : 10.1023/A:1021814225969
URL : http://www.optimization-online.org/DB_FILE/2008/08/2061.pdf
An optimal randomized incremental gradient method, Mathematical Programming ,
DOI : 10.1007/s10107-014-0839-0
Optimization transfer using surrogate objective functions, Journal of computational and graphical statistics, vol.9, issue.1, pp.1-20, 2000. ,
DOI : 10.2307/1390605
Sparse online learning via truncated gradient, Journal of Machine Learning Research (JMLR), vol.10, pp.777-801, 2009. ,
Fastfood?approximating kernel expansions in loglinear time, Proceedings of the International Conference on Machine Learning (ICML), 2013. ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998. ,
DOI : 10.1109/5.726791
URL : http://www.cs.berkeley.edu/~daf/appsem/Handwriting/papers/00726791.pdf
Efficient backprop, Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science LNCS 1524, 1998. ,
Deeply-supervised nets, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2015. ,
Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016. ,
DOI : 10.1109/tpami.2017.2703082
Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems (NIPS), 2001. ,
Proximal Newton-type methods for convex optimization, Advances in Neural Information Processing Systems (NIPS), pp.836-844, 2012. ,
Practical Aspects of the Moreau--Yosida Regularization: Theoretical Preliminaries, SIAM Journal on Optimization, vol.7, issue.2, pp.367-385, 1997. ,
DOI : 10.1137/S1052623494267127
THE SPECTRUM KERNEL: A STRING KERNEL FOR SVM PROTEIN CLASSIFICATION, Biocomputing 2002, pp.566-575, 2002. ,
DOI : 10.1142/9789812799623_0053
IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly, Journal of Computational Biology, vol.18, issue.11, pp.1693-1707, 2011. ,
DOI : 10.1089/cmb.2011.0171
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3216102/pdf
A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems (NIPS), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01160728
Quickening: A generic Quasi-Newton algorithm for faster gradient-based optimization, 2017. ,
Network in network, Proceedings of the International Conference on Learning Representations (ICLR), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01551350
On the limited memory BFGS method for large scale optimization, Mathematical Programming, vol.32, issue.2, pp.503-528, 1989. ,
DOI : 10.1007/BF01589116
On the computational efficiency of training neural networks, Advances in Neural Information Processing Systems (NIPS), 2014. ,
Complexity analysis of the Lasso regularization path, Proceedings of the International Conference on Machine Learning (ICML), 2012. ,
Discriminative learned dictionaries for local image analysis, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008. ,
DOI : 10.1109/CVPR.2008.4587652
URL : http://mplab.ucsd.edu/wp-content/uploads/cvpr2008/conference/data/papers/312.pdf
Supervised dictionary learning, Advances in Neural Information Processing Systems (NIPS), 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00322431
Sparse Representation for Color Image Restoration, IEEE Transactions on Image Processing, vol.17, issue.1, pp.53-69, 2008. ,
DOI : 10.1109/TIP.2007.911828
URL : http://www.ima.umn.edu/preprints/oct2006/2139.pdf
Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation, Proceedings of the European Conference on Computer Vision (ECCV), 2008. ,
DOI : 10.1109/ICCV.2005.171
Learning Multiscale Sparse Representations for Image and Video Restoration, Multiscale Modeling & Simulation, vol.7, issue.1, pp.214-241, 2008. ,
DOI : 10.1137/070697653
URL : http://www.di.ens.fr/~mairal/resources/pdf/KSVDMultiScale.pdf
Online dictionary learning for sparse coding, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553463
URL : http://www.ima.umn.edu/preprints/apr2009/2249.pdf
Non-local sparse models for image restoration, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459452
Online learning for matrix factorization and sparse coding, Journal of Machine Learning Research (JMLR), vol.11, pp.19-60, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00408716
Network flow algorithms for structured sparsity, Advances in Neural Information Processing Systems (NIPS), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00512556
Convex and network flow optimization for structured sparsity, Journal of Machine Learning Research (JMLR), vol.12, pp.2681-2720, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00584817
Task-Driven Dictionary Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.4, pp.791-804, 2012. ,
DOI : 10.1109/TPAMI.2011.156
URL : https://hal.archives-ouvertes.fr/inria-00521534
Sparse modeling for image and vision processing. Foundations and Trends in Computer Vision and Graphics, 2014. ,
DOI : 10.1561/0600000058
URL : https://hal.archives-ouvertes.fr/hal-01081139
Supervised feature selection in graphs with path coding penalties and network flows, Journal of Machine Learning Research, vol.14, issue.1, pp.2449-2485, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00806372
Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, pp.2-385, 2014. ,
DOI : 10.1561/0600000058
URL : https://hal.archives-ouvertes.fr/hal-01081139
A theory for multiresolution signal decomposition: the wavelet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.11, issue.7, pp.674-693, 1989. ,
DOI : 10.1109/34.192463
Choosing variables in a linear regression: A graphical aid, 1964. ,
Choosing a subset regression. unpublished paper presented at the Joint Statistical Meeting, 1966. ,
Portfolio selection, Journal of Finance, vol.7, issue.1, pp.77-91, 1952. ,
Spatial frequency and orientation tuning dynamics in area V1, Proceedings of the National Academy of Sciences USA, pp.1645-1650, 2002. ,
DOI : 10.1017/S095252380017107X
URL : http://www.pnas.org/content/99/3/1645.full.pdf
Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.7, issue.4, pp.417-473, 2010. ,
DOI : 10.1186/1471-2105-9-307
URL : http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2010.00740.x/pdf
Dictionary learning for massive matrix factorization, Proceedings of the International Conference on Machine Learning (ICML), 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01308934
Stochastic Subsampling for Factorizing Huge Matrices, IEEE Transactions on Signal Processing, 2017. ,
DOI : 10.1109/TSP.2017.2752697
URL : https://hal.archives-ouvertes.fr/hal-01431618
A quasi-second-order proximal bundle algorithm, Mathematical Programming, pp.51-72, 1996. ,
DOI : 10.1007/978-3-642-82450-0_12
Kernel analysis of deep networks, Journal of Machine Learning Research (JMLR), vol.12, pp.2563-2581, 2011. ,
Bayesian Learning for Neural Networks, 1994. ,
DOI : 10.1007/978-1-4612-0745-0
A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 1998. ,
Structured penalties for log-linear language models, Empirical Methods in Natural Language Processing (EMNLP), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00904820
Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009. ,
DOI : 10.1137/070704277
URL : https://hal.archives-ouvertes.fr/hal-00976649
A method of solving a convex programming problem with convergence rate, Soviet Mathematics Doklady, vol.27, issue.1 22, pp.372-376, 1983. ,
Introductory Lectures on Convex Optimization: A Basic Course, 2004. ,
DOI : 10.1007/978-1-4419-8853-9
Gradient methods for minimizing composite functions, Mathematical Programming, pp.125-161, 2013. ,
DOI : 10.1109/TIT.2005.864420
Cubic regularization of Newton method and its global performance, Mathematical Programming, vol.99, issue.1, pp.177-205, 2006. ,
DOI : 10.1007/s10107-006-0706-8
Confidence level solutions for stochastic programming, Automatica, vol.44, issue.6, pp.1559-1568, 2008. ,
DOI : 10.1016/j.automatica.2008.01.017
URL : http://ecolu-info.unige.ch/~logilab/reports/GradStoc.ps
Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems, SIAM Journal on Optimization, vol.22, issue.2, pp.341-362, 2012. ,
DOI : 10.1137/100802001
Reading digits in natural images with unsupervised feature learning, NIPS workshop on deep learning and unsupervised feature learning, 2011. ,
Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies, Current Biology, vol.21, issue.19, pp.1641-1646, 2011. ,
DOI : 10.1016/j.cub.2011.08.031
URL : https://doi.org/10.1016/j.cub.2011.08.031
Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, vol.381, issue.6583, pp.607-609, 1996. ,
DOI : 10.1038/381607a0
A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, vol.20, issue.3, pp.389-403, 2000. ,
DOI : 10.1093/imanum/20.3.389
Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, vol.18, issue.2, pp.111-126, 1994. ,
DOI : 10.1007/978-3-642-93295-3_112
Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing, Nature Genetics, vol.76, issue.12, pp.401413-1415, 2008. ,
DOI : 10.1016/j.molcel.2004.12.004
4wd-catalyst acceleration for gradient-based non-convex optimization, 2017. ,
Local Convolutional Features with Unsupervised Training for Image Retrieval, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.19
URL : https://hal.archives-ouvertes.fr/hal-01207966
Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach, International Journal of Computer Vision, vol.34, issue.3, p.2016 ,
DOI : 10.1109/CVPR.2015.7298767
URL : https://hal.archives-ouvertes.fr/hal-01277109
Logik der Forschung Zur Erkenntnistheorie der modernen Naturwissenschaft. Payot, 1934. translated in French under the title " La logique de la découverte scientifique, 1973. ,
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples, Proceedings of the European Conference on Computer Vision (ECCV), 2016. ,
DOI : 10.1007/978-3-540-88682-2_24
Random features for large-scale kernel machines, Advances in Neural Information Processing Systems (NIPS), 2007. ,
A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks, Mathematical Programming, pp.515-545, 2016. ,
DOI : 10.1007/978-1-4612-1394-9
Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming, pp.1-38, 2014. ,
DOI : 10.1111/j.1467-9868.2005.00503.x
Ein Verfahren zur L??sung parameterabh??ngiger, nichtlinearer Maximum-Probleme, Unternehmensforschung Operations Research - Recherche Op??rationnelle, vol.27, issue.4, pp.149-166, 1962. ,
DOI : 10.1007/BF01920852
A stochastic approximation method. The Annals of Mathematical Statistics, pp.400-407, 1951. ,
Monotone Operators and the Proximal Point Algorithm, SIAM Journal on Control and Optimization, vol.14, issue.5, pp.877-898, 1976. ,
DOI : 10.1137/0314056
URL : http://www.math.washington.edu/~rtr/papers/rtr-MonoOpProxPoint.pdf
Toward a Unified Theory of Visual Area V4, Neuron, vol.74, issue.1, pp.12-29, 2012. ,
DOI : 10.1016/j.neuron.2012.03.011
Piecewise linear regularized solution paths, The Annals of Statistics, vol.35, issue.3, pp.1012-1030, 2007. ,
DOI : 10.1214/009053606000001370
URL : http://doi.org/10.1214/009053606000001370
Inexact and accelerated proximal point algorithms, Journal of Convex Analysis, vol.19, issue.4, pp.1167-1192, 2012. ,
Practical inexact proximal quasi-Newton method with global complexity analysis, Mathematical Programming, pp.1-35, 2014. ,
DOI : 10.1109/TSP.2009.2016892
URL : http://arxiv.org/pdf/1311.6547
Minimizing finite sums with the stochastic average gradient, Mathematical Programming, 2016. ,
DOI : 10.1007/s10107-016-1030-6
URL : https://hal.archives-ouvertes.fr/hal-00860051
Support Vector Learning, 1997. ,
Learning with kernels: support vector machines, regularization , optimization, and beyond, 2002. ,
Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation, vol.20, issue.5, pp.1299-1319, 1998. ,
DOI : 10.1007/BF02281970
Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978. ,
DOI : 10.1214/aos/1176344136
Stochastic dual coordinate ascent methods for regularized loss, Journal of Machine Learning Research, vol.14, pp.567-599, 2013. ,
Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Mathematical Programming, pp.105-145, 2016. ,
DOI : 10.1023/A:1012498226479
URL : http://arxiv.org/pdf/1309.2375
SDCA without Duality, Regularization, and Individual Convexity, Proceedings of the International Conference on Machine Learning (ICML), 2016. ,
Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.3-30, 2011. ,
DOI : 10.1145/1273496.1273598
An introduction to support vector machines and other kernel-based learning methods, 2004. ,
Tangent prop?a formalism for specifying selected invariances in an adaptive network, Advances in Neural Information Processing Systems (NIPS), 1992. ,
Transformation invariance in pattern recognition: Tangent distance and propagation, Neural Networks: Tricks of the Trade, number 1524 in Lecture Notes in Computer Science, pp.239-274, 1998. ,
DOI : 10.1007/978-3-642-88163-3
URL : https://hal.archives-ouvertes.fr/halshs-00009505
Shiftable multiscale transforms, IEEE Transactions on Information Theory, vol.38, issue.2, pp.587-607, 1992. ,
DOI : 10.1109/18.119725
URL : http://www.cns.nyu.edu/pub/eero/simoncelli91.ps.gz
Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems (NIPS), 2014. ,
Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR), 2015. ,
Mathematics of the Neural Response, Foundations of Computational Mathematics, vol.15, issue.7, pp.67-91, 2010. ,
DOI : 10.1017/CBO9780511809682
Sparse greedy matrix approximation for machine learning, Proceedings of the International Conference on Machine Learning (ICML), 2000. ,
Parallel distributed processing: explorations in the microstructure of cognition chapter information processing in dynamical systems: foundations of harmony theory, pp.194-281, 1986. ,
Practical bayesian optimization of machine learning algorithms, Advances in Neural Information Processing Systems (NIPS), 2012. ,
Large scale multiple kernel learning, Journal of Machine Learning Research (JMLR), vol.7, pp.1531-1565, 2006. ,
Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research (JMLR), vol.15, pp.1929-1958, 2014. ,
Learning with hierarchical gaussian kernels, 2016. ,
Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.182
Hierarchical penalization, Advances in Neural Information Processing Systems (NIPS), 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-00267338
Composite kernel learning, Machine Learning, vol.37, issue.6A, pp.73-103, 2010. ,
DOI : 10.1007/978-1-4757-2440-0
URL : https://hal.archives-ouvertes.fr/hal-00316016
Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society: Series B, vol.58, issue.1, pp.267-288, 1996. ,
DOI : 10.1111/j.1467-9868.2011.00771.x
DOLPHIn???Dictionary Learning for Phase Retrieval, IEEE Transactions on Signal Processing, vol.64, issue.24, pp.6485-6500, 2016. ,
DOI : 10.1109/TSP.2016.2607180
URL : https://hal.archives-ouvertes.fr/hal-01387428
Anchored Neighborhood Regression for Fast Example-Based Super-Resolution, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.241
Learning Spatiotemporal Features with 3D Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.510
URL : http://arxiv.org/pdf/1412.0767
Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization, Journal of Optimization Theory and Applications, vol.109, issue.3, pp.475-494, 2001. ,
DOI : 10.1023/A:1017501703105
Simultaneous Variable Selection, Technometrics, vol.47, issue.3, pp.349-363, 2005. ,
DOI : 10.1198/004017005000000139
The nature of statistical learning theory, 2000. ,
Multi-subject Dictionary Learning to Segment an Atlas of Brain Spontaneous Activity, Biennial International Conference on Information Processing in Medical Imaging, 2011. ,
DOI : 10.1007/978-3-642-22092-0_46
URL : https://hal.archives-ouvertes.fr/inria-00588898
Efficient Additive Kernels via Explicit Feature Maps, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.3, pp.480-492, 2012. ,
DOI : 10.1109/TPAMI.2011.153
URL : http://eprints.pascal-network.org/archive/00006964/01/vedaldi10.pdf
Altitude Training: Strong Bounds for Single-layer Dropout, Advances in Neural Information Processing Systems (NIPS), 2014. ,
Graphical Models, Exponential Families, and Variational Inference, Foundations and Trends?? in Machine Learning, vol.1, issue.1???2, pp.1-305, 2008. ,
DOI : 10.1561/2200000001
URL : http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf
Regularization of neural networks using dropconnect, Proceedings of the International Conference on Machine Learning (ICML), 2013. ,
Action recognition with trajectory-pooled deepconvolutional descriptors, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/cvpr.2015.7299059
URL : http://arxiv.org/abs/1505.04868
Deep networks for image superresolution with sparse prior, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/iccv.2015.50
URL : http://arxiv.org/pdf/1507.08905
Adaptive switching circuits, RE WESCON Convention Record, pp.96-104, 1960. ,
DOI : 10.21236/AD0241531
Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems (NIPS), 2001. ,
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks, Neural Computation, vol.11, issue.2, pp.270-280, 1989. ,
DOI : 10.1016/0885-064X(88)90021-0
URL : ftp://ftp.ccs.neu.edu/pub/people/rjw/rtrl-nc-89.ps
Neural Representation of Natural Images in Visual Area V2, Journal of Neuroscience, vol.30, issue.6, pp.2102-2114, 2010. ,
DOI : 10.1523/JNEUROSCI.4099-09.2010
Sparse Reconstruction by Separable Approximation, IEEE Transactions on Signal Processing, vol.57, issue.7, pp.2479-2493, 2009. ,
DOI : 10.1109/TSP.2009.2016892
URL : http://www.cs.wisc.edu/~swright/papers/Wright_Nowak_Figueiredo_2007_submitted.pdf
NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq, BMC Bioinformatics, vol.12, issue.1, p.162, 2011. ,
DOI : 10.1214/009053604000000067
URL : https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-12-162?site=bmcbioinformatics.biomedcentral.com
Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research (JMLR), vol.11, pp.2543-2596, 2010. ,
A Proximal Stochastic Gradient Method with Progressive Variance Reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014. ,
DOI : 10.1137/140961791
URL : http://arxiv.org/pdf/1403.4699
Linear spatial pyramid matching using sparse coding for image classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009. ,
A supervised strategy for deep kernel machine, Proceedings of ESANN, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00668302
A quasi-Newton approach to non-smooth convex optimization, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390309
URL : http://icml2008.cs.helsinki.fi/papers/461.pdf
Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006. ,
DOI : 10.1198/016214502753479356
URL : http://www2.isye.gatech.edu/~myuan/papers/glasso.final.pdf
Stochastic pooling for regularization of deep convolutional neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2013. ,
Visualizing and Understanding Convolutional Networks, Proceedings of the European Conference on Computer Vision (ECCV), 2014. ,
DOI : 10.1007/978-3-319-10590-1_53
URL : http://cs.nyu.edu/%7Efergus/papers/zeilerECCV2014.pdf
On Single Image Scale-Up Using Sparse-Representations, Curves and Surfaces, pp.711-730, 2010. ,
DOI : 10.1109/ICCV.2009.5459271
Clustered Nystr??m Method for Large Scale Manifold Learning and Dimension Reduction, IEEE Transactions on Neural Networks, vol.21, issue.10, pp.1576-1587, 2010. ,
DOI : 10.1109/TNN.2010.2064786
Convexified convolutional neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2016. ,
The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, vol.37, issue.6A, pp.3468-3497, 2009. ,
DOI : 10.1214/07-AOS584
URL : http://doi.org/10.1214/07-aos584
Sparse Principal Component Analysis, Journal of Computational and Graphical Statistics, vol.15, issue.2, pp.265-286, 2006. ,
DOI : 10.1198/106186006X113430