M. Cettolo, J. Niehues, S. Stüker, L. Bentivogli, and M. Federico, Report on the 11th iwslt evaluation campaign, IWSLT, 2014.

M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and L. Kaiser, Universal transformers, Proc. of ICLR, 2018.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. of NAACL, 2019.

S. Edunov, M. Ott, M. Auli, D. Grangier, and M. Ranzato, Classical structured prediction losses for sequence to sequence learning, Proc. of NAACL, 2018.

M. Figurnov, A. Sobolev, and D. P. Vetrov, Probabilistic adaptive computation time, ArXiv preprint, 2017.

J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. Dauphin, Convolutional sequence to sequence learning, Proc. of ICML, 2017.

A. Graves, Adaptive computation time for recurrent neural networks, 2016.

G. Huang, D. Chen, T. Li, F. Wu, L. Van-der-maaten et al., Multi-scale dense networks for resource efficient image classification, Proc. of ICLR, 2017.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, Proc. of ICLR, 2015.

N. Ng, K. Yee, A. Baevski, M. Ott, M. Auli et al., Facebook fair's wmt19 news translation task submission, Proc. of WMT, 2019.

M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross et al., Fairseq: A fast, extensible toolkit for sequence modeling, Proc. of NAACL, 2019.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, BLEU: a method for automatic evaluation of machine translation, Proc. of ACL, 2002.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei et al., Language models are unsupervised multitask learners, OpenAI, 2019.

R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units, Proc. of ACL, 2016.

S. Teerapittayanon, B. Mcdanel, and H. Kung, Branchynet: Fast inference via early exiting from deep neural networks. In ICPR, 2016.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is all you need, Proc. of NeurIPS, 2017.

X. Wang, F. Yu, Z. Dou, T. Darrell, and J. E. Gonzalez, Skipnet: Learning dynamic routing in convolutional networks, Proc. of ECCV, 2018.