Report on the 11th iwslt evaluation campaign, IWSLT, 2014. ,
Universal transformers, Proc. of ICLR, 2018. ,
BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. of NAACL, 2019. ,
Classical structured prediction losses for sequence to sequence learning, Proc. of NAACL, 2018. ,
Probabilistic adaptive computation time, ArXiv preprint, 2017. ,
Convolutional sequence to sequence learning, Proc. of ICML, 2017. ,
Adaptive computation time for recurrent neural networks, 2016. ,
Multi-scale dense networks for resource efficient image classification, Proc. of ICLR, 2017. ,
Adam: A method for stochastic optimization, Proc. of ICLR, 2015. ,
Facebook fair's wmt19 news translation task submission, Proc. of WMT, 2019. ,
Fairseq: A fast, extensible toolkit for sequence modeling, Proc. of NAACL, 2019. ,
BLEU: a method for automatic evaluation of machine translation, Proc. of ACL, 2002. ,
Language models are unsupervised multitask learners, OpenAI, 2019. ,
Neural machine translation of rare words with subword units, Proc. of ACL, 2016. ,
, Branchynet: Fast inference via early exiting from deep neural networks. In ICPR, 2016.
Attention is all you need, Proc. of NeurIPS, 2017. ,
Skipnet: Learning dynamic routing in convolutional networks, Proc. of ECCV, 2018. ,