,
, for each ? ? ?, we test several learning rates ? ? {10 ?2 , 10 ?3 , 10 ?4 }, and we select the learning rate ? ? which led to the best accuracy
Learning the number of neurons in deep networks, Advances in Neural Information Processing Systems, pp.2270-2278, 2016. ,
Natural gradient works efficiently in learning, Neural Comput, vol.10, pp.251-276, 1998. ,
Do deep nets really need to be deep?, Advances in neural information processing systems, pp.2654-2662, 2014. ,
Designing neural network architectures using reinforcement learning, 2016. ,
Online learning rate adaptation with hypergradient descent, International Conference on Learning Representations, 2018. ,
Practical recommendations for gradient-based training of deep architectures, Neural networks: Tricks of the trade, pp.437-478 ,
, , 2012.
Convex neural networks, Advances in Neural Information Processing Systems, vol.18, pp.123-130, 2006. ,
Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures, 2013. ,
Smash: One-shot model architecture search through hypernetworks, 6th International Conference on Learning Representations, 2018. ,
, , 2016.
Compressing neural networks with the hashing trick, International Conference on Machine Learning, pp.2285-2294, 2015. ,
Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems, vol.2, issue.4, pp.303-314, 1989. ,
ImageNet: A Large-Scale Hierarchical Image Database, CVPR09, 2009. ,
Predicting parameters in deep learning, Advances in Neural Information Processing Systems, vol.26, pp.2148-2156, 2013. ,
, Stronger baselines for trustable results in neural machine translation, 2017.
Gradient descent provably optimizes over-parameterized neural networks, 2019. ,
Adaptive subgradient methods for online learning and stochastic optimization, JMLR, vol.12, pp.2121-2159, 2011. ,
Combining adaptive algorithms and hypergradient method: a performance and robustness study, 2018. ,
, The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks, 2018.
The lottery ticket hypothesis at scale, 2019. ,
, , 2010.
Understanding the difficulty of training deep feedforward neural networks, Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.249-256, 2010. ,
Compressing deep convolutional networks using vector quantization, 2014. ,
Practical variational inference for neural networks, Advances in Neural Information Processing Systems, pp.2348-2356, 2011. ,
A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention, Workshop on Automatic Machine Learning, pp.21-30, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01381145
, Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, 2015.
Learning both weights and connections for efficient neural network, Advances in neural information processing systems, pp.1135-1143, 2015. ,
Learning both Weights and Connections for Efficient Neural Networks, NIPS, 2015. ,
Second order derivatives for network pruning: Optimal brain surgeon, Advances in neural information processing systems, pp.164-171, 1993. ,
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE international conference on computer vision, pp.1026-1034, 2015. ,
Deep residual learning for image recognition, ICCV, pp.770-778, 2016. ,
Tracking the best expert, Machine learning, vol.32, issue.2, pp.151-178, 1998. ,
Distilling the knowledge in a neural network, 2015. ,
Keeping neural networks simple, International Conference on Artificial Neural Networks, pp.11-18, 1993. ,
Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997. ,
Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017. ,
Densely connected convolutional networks, CVPR, vol.1, p.3, 2017. ,
Data-driven sparse structure selection for deep neural networks, Proceedings of the European Conference on Computer Vision (ECCV), pp.304-320, 2018. ,
The Analysis of Linear Partial Differential Operators I, 1998. ,
Increased rates of convergence through learning rate adaptation, Neural networks, vol.1, issue.4, pp.295-307, 1988. ,
Neural tangent kernel: Convergence and generalization in neural networks, Advances in neural information processing systems, pp.8571-8580, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01824549
, Three factors influencing minima in sgd, 2017.
An introduction to variational methods for graphical models, Machine learning, vol.37, issue.2, pp.183-233, 1999. ,
Exploring the limits of language modeling, 2016. ,
An empirical exploration of recurrent network architectures, International Conference on Machine Learning, pp.2342-2350, 2015. ,
Improving generalization performance by switching from Adam to SGD, 2017. ,
, Kianglu. pytorch-cifar, 2018.
Adam: A Method for Stochastic Optimization, International Conference on Learning Representations, 2015. ,
Variational dropout and the local reparameterization trick, Advances in Neural Information Processing Systems, pp.2575-2583, 2015. ,
, , 2008.
Learning Multiple Layers of Features from Tiny Images, 2009. ,
, One weird trick for parallelizing convolutional neural networks, 2014.
Learning Rate Tuning in Deep Learning: A Practical Guide | Machine Learning Explained, 2018. ,
Backpropagation applied to handwritten zip code recognition, Neural computation, vol.1, issue.4, pp.541-551, 1989. ,
Efficient backprop, Neural Networks: Tricks of the Trade, pp.9-50, 1998. ,
Optimal brain damage, pp.598-605, 1990. ,
Optimal brain damage, Advances in neural information processing systems, pp.598-605, 1990. ,
Wide neural networks of any depth evolve as linear models under gradient descent, 2019. ,
Preconditioned stochastic gradient Langevin dynamics for deep neural networks, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp.1788-1794, 2016. ,
, , 2016.
Hyperband: A novel bandit-based approach to hyperparameter optimization, JMLR, vol.18, issue.1, pp.6765-6816, 2017. ,
Continuous control with deep reinforcement learning, 2015. ,
Progressive neural architecture search, Proceedings of the European Conference on Computer Vision (ECCV), pp.19-34, 2018. ,
, Darts: Differentiable architecture search, 2018.
Learning efficient convolutional networks through network slimming, 2017 IEEE International Conference on, pp.2755-2763, 2017. ,
Rethinking the value of network pruning, 2018. ,
Bayesian compression for deep learning, Advances in Neural Information Processing Systems, pp.3288-3298, 2017. ,
How to pick the best learning rate for your machine learning project, 2016. ,
Bayesian model comparison and backprop nets, Advances in neural information processing systems, pp.839-846, 1992. ,
A practical bayesian framework for backpropagation networks, Neural computation, vol.4, issue.3, pp.448-472, 1992. ,
Probable networks and plausible predictions-a review of practical bayesian methods for supervised neural networks. Network: computation in neural systems, vol.6, pp.469-505, 1995. ,
Information theory, inference and learning algorithms, 2003. ,
Gradient-based hyperparameter optimization through reversible learning, International Conference on Machine Learning, pp.2113-2122, 2015. ,
Tuning-free step-size adaptation, Acoustics, Speech and Signal Processing, pp.2121-2124, 2012. ,
Natural langevin dynamics for neural networks, International Conference on Geometric Science of Information, pp.451-459, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01655949
Building a large annotated corpus of english: The penn treebank, Comput. Linguist, vol.19, issue.2, pp.313-330, 1993. ,
, Speed learning on the fly, 2015.
Gaussian process behaviour in wide deep neural networks, 2018. ,
Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, p.529, 2015. ,
Variational dropout sparsifies deep neural networks, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2498-2507, 2017. ,
Auto-sizing neural networks: With applications to n-gram language models, 2015. ,
Bayesian learning for neural networks, 1995. ,
Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, vol.37, pp.3311-3325, 1997. ,
Automatic differentiation in pytorch, 2017. ,
Handbook of integral equations, 1998. ,
Large-scale evolution of image classifiers, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2902-2911, 2017. ,
A stochastic approximation method, Annals of Mathematical Statistics, vol.22, pp.400-407, 1951. ,
On random weights and unsupervised feature learning, ICML, pp.1089-1096, 2011. ,
Group sparse regularization for deep neural networks, Neurocomputing, vol.241, pp.81-89, 2017. ,
No more pesky learning rates, International Conference on Machine Learning, pp.343-351, 2013. ,
Local gain adaptation in stochastic gradient descent, 1999. ,
Compression of neural machine translation models via pruning, p.291, 2016. ,
Very deep convolutional networks for largescale image recognition, 2014. ,
Very deep convolutional networks for large-scale image recognition, 2014. ,
Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014. ,
Evolving neural networks through augmenting topologies, Evolutionary computation, vol.10, issue.2, pp.99-127, 2002. ,
Introduction to Fourier analysis on Euclidean spaces, vol.32, 2016. ,
Estimating an Optimal Learning Rate For a Deep Neural Network, 2017. ,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions, ICCV, pp.1-9, 2015. ,
Machine learning: a Bayesian and optimization perspective, 2015. ,
Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), vol.58, issue.1, pp.267-288, 1996. ,
Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), vol.58, issue.1, pp.267-288, 1996. ,
Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, vol.4, pp.26-31, 2012. ,
Catching up faster by switching sooner: A predictive approach to adaptive estimation with an application to the AIC-BIC dilemma, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.74, issue.3, pp.361-417, 2012. ,
Catching up faster in Bayesian model selection and model averaging, vol.20, pp.417-424, 2008. ,
Switching between two universal source coding algorithms, Data Compression Conference, 1998. DCC'98. Proceedings, pp.491-500, 1998. ,
Bayesian Model Selection and Model Averaging, Journal of Mathematical Psychology, vol.44, 2000. ,
Backpropagation through time: what it does and how to do it, Proceedings of the IEEE, vol.78, issue.10, pp.1550-1560, 1990. ,
The marginal value of adaptive gradient methods in machine learning, NIPS, pp.4148-4158, 2017. ,
Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, pp.49-67, 2006. ,
Understanding deep learning requires rethinking generalization, 2017. ,
Are all layers created equal?, 2019. ,
, Neural architecture search with reinforcement learning, 2016.