R. Anil, G. Pereyra, A. Passos, R. Ormandi, G. E. Dahl et al., Large scale distributed neural network training through online distillation, 2018.

J. Ba and R. Caruana, Do deep nets really need to be deep, NIPS, 2014.

T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama, Adaptive neural networks for efficient inference, ICML, 2017.

L. Breiman, Bagging predictors. Machine learning, 1996.

H. Cai, L. Zhu, S. Han, and . Proxylessnas, Direct neural architecture search on target task and hardware. ICLR, 2019.

M. Elbayad, J. Gu, E. Grave, A. , and M. , Depth-adaptive transformer, ICLR, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02422914

T. Furlanello, Z. C. Lipton, M. Tschannen, L. Itti, A. et al., Born again neural networks, In ICML, 2018.

L. Hansen and P. Salamon, Neural network ensembles. PAMI, vol.12, issue.10, 1990.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, CVPR, 2016.

G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, 2015.

A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen et al., Searching for mobilenetv3, ICCV, 2019.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang et al., Efficient convolutional neural networks for mobile vision applications, 2017.

G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft et al., Snapshot ensembles: Train 1, get m for free, 2017.

G. Huang, D. Chen, T. Li, F. Wu, L. Van-der-maaten et al., Multi-scale dense networks for resource efficient image classification, 2018.

G. Huang, S. Liu, L. Van-der-maaten, and K. Weinberger, CondenseNet: An efficient densenet using learned group convolutions, CVPR, 2018.

Z. Huang and N. Wang, Data-driven sparse structure selection for deep neural networks, In ECCV, 2018.

E. Ilg, O. Cicek, S. Galesso, A. Klein, O. Makansi et al., Uncertainty estimates and multihypotheses networks for optical flow, ECCV, 2018.

A. Krizhevsky, Learning multiple layers of features from tiny images, 2009.

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet classification with deep convolutional neural networks, NIPS, 2012.

A. Krogh and J. Vedelsby, Neural network ensembles, cross validation, and active learning, NIPS, 1995.

X. Lan, X. Zhu, and S. Gong, Knowledge distillation by on-the-fly native ensemble, NIPS, 2018.

J. Lee and S. Chung, Robust training with ensemble consensus, ICLR, 2020.

S. Lee, S. Purushwalkam, M. Cogswell, D. Crandall, and D. Batra, Why m heads are better than one: Training a diverse ensemble of deep networks, 2015.

H. Li, H. Zhang, X. Qi, R. Yang, and G. Huang, Improved techniques for training adaptive deep networks, ICCV, 2019.

Y. Liu, J. Stehouwer, A. Jourabloo, and X. Liu, Deep tree learning for zero-shot face anti-spoofing, CVPR, 2019.

Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan et al., Learning efficient convolutional networks through network slimming, 2017.

I. Loshchilov, F. Hutter, and . Sgdr, Stochastic gradient descent with warm restarts, ICLR, 2017.

N. Ma, X. Zhang, H. Zheng, J. Sun, . Shufflenet et al., Practical guidelines for efficient CNN architecture design, ECCV, 2018.

A. Malinin, B. Mlodozeniec, and M. Gales, Ensemble distribution distillation, ICLR, 2020.

S. Mehta, M. Rastegari, L. Shapiro, and H. Hajishirzi, Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network, 2019.

R. Minetto, M. Segundo, and S. Sarkar, Hydra: an ensemble of convolutional neural networks for geospatial land classification, IEEE Transactions on Geoscience and Remote Sensing, 2019.

U. Naftaly, N. Intrator, and D. Horn, Optimal ensemble averaging of neural networks. Network: Computation in Neural Systems, 1997.

B. Neal, S. Mittal, A. Baratin, V. Tantia, M. Scicluna et al., A modern take on the bias-variance tradeoff in neural networks, 2018.

L. Rokach, Ensemble-based classifiers. Artificial Intelligence Review, 2010.

A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta et al., Hints for thin deep nets. ICLR, 2015.

D. Roy, P. Panda, R. , and K. , Tree-CNN: a hierarchical deep convolutional neural network for incremental learning, Neural Networks, 2020.

A. Ruiz and J. Verbeek, Adaptative inference cost with convolutional neural mixture models, ICCV, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02267564

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., , 2015.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, C. et al., MobileNetV2: Inverted residuals and linear bottlenecks, 2018.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

G. Song and W. Chai, Collaborative learning for deep neural networks, NIPS, 2018.

M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler et al., MnasNet: Platform-aware neural architecture search for mobile, CVPR, 2019.

R. Tanno, K. Arulkumaran, D. Alexander, A. Criminisi, and N. , A. Adaptive neural trees. ICML, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02465552

A. Veit and S. Belongie, Convolutional networks with adaptive inference graphs, ECCV, 2018.

T. Véniat and L. Denoyer, Learning time/memory-efficient deep architectures with budgeted super networks, CVPR, 2018.

X. Wang, F. Yu, Z. Dou, T. Darrell, and J. E. Gonzalez, SkipNet: Learning dynamic routing in convolutional networks, In ECCV, 2018.

B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun et al., FBNet: Hardwareaware efficient convnet design via differentiable neural architecture search, CVPR, 2019.

Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. Davis et al., BlockDrop: Dynamic inference paths in residual networks, CVPR, 2018.

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, CVPR, 2017.

J. Yu and T. Huang, Universally slimmable networks and improved training techniques. ICCV, 2019.

J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, Slimmable neural networks. ICLR, 2018.

C. Zhang, M. Ren, and R. Urtasun, Graph hypernetworks for neural architecture search, 2019.

Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, Deep mutual learning, CVPR, 2018.

Z. Zhou, J. Wu, and W. Tang, Ensembling neural networks: many could be better than all. Artificial intelligence, 2002.