N. Bogoychev, K. Heafield, A. F. Aji, and M. Junczys-dowmunt, Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, vol.18, pp.2991-2996, 2018.

O. Bojar, O. Du?ek, T. Kocmi, J. Libovický, M. Novák et al., CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered, Text, Speech, and Dialogue, pp.231-238, 2016.

O. Bojar, C. Federmann, M. Fishel, Y. Graham, B. Haddow et al., Findings of the 2018 Conference on Machine Translation (WMT18), Proceedings of the Third Conference on Machine Translation: Shared Task Papers, vol.2, pp.272-307, 2018.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology companion volume of the Proceedings of HLT-NAACL 2003--short papers - NAACL '03, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics; Human Language Technologies, NAACL-HLT'19, 2003.

S. Edunov, M. Ott, M. Auli, and D. Grangier, Understanding Back-Translation at Scale, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, vol.18, pp.489-500, 2018.

T. E. , The Second International Chinese Word Segmentation Bakeoff, Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, 2005.

M. Amin-farajian, M. Turchi, M. Negri, and M. Federico, Multi-domain neural machine translation through unsupervised adaptation, Proceedings of the 2nd Conference on Machine Translation, pp.127-137, 2017.

Y. Gal and Z. Ghahramani, Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks, Advances in Neural Information Processing Systems 14, pp.1019-1027, 2002.

B. Haddow, N. Bogoychev, D. Emelin, U. Germann, R. Grundkiewicz et al., The University of Edinburgh?s Submissions to the WMT18 News Translation Task, Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp.399-409, 2018.

V. C. Hoang, P. Koehn, G. Haffari, and T. Cohn, Iterative Back-Translation for Neural Machine Translation, Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pp.18-24, 2018.

M. Junczys-dowmunt, Microsoft?s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data, Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp.425-430, 2018.

M. Junczys-dowmunt, R. Grundkiewicz, T. Dwojak, H. Hoang, K. Heafield et al., Marian: Fast Neural Machine Translation in C++, Proceedings of ACL 2018, System Demonstrations, 2018.

A. Martins and . Birch, Marian: Fast Neural Machine Translation in C++, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL'18, pp.116-121, 2018.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, Proceedings of the 3rd International Conference on Learning Representations, ICLR'15, 2015.

P. Koehn, R. Zens, C. Dyer, O. Bojar, A. Constantin et al., Moses, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions - ACL '07, pp.177-180, 2007.

T. Kudo, Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol.1, pp.66-75, 2018.

T. Kudo and J. Richardson, SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.66-71, 2018.

G. Lample and A. Conneau, Cross-lingual Language Model Pretraining, 2019.

A. Valerio-miceli-barone, J. Helcl, R. Sennrich, B. Haddow, and A. Birch, Deep Architectures for Neural Machine Translation, Proceedings of the 2nd Conference on Machine Translation, vol.1, 2017.

M. Popel, CUNI Transformer Neural MT System for WMT18, Proceedings of the Third Conference on Machine Translation: Shared Task Papers, vol.2, pp.486-491, 2018.

M. Post, A Call for Clarity in Reporting BLEU Scores, Proceedings of the Third Conference on Machine Translation: Research Papers, pp.186-191, 2018.

R. Sennrich, A. Birch, A. Currey, U. Germann, B. Haddow et al., The University of Edinburgh's Neural MT Systems for WMT17, Proceedings of the Second Conference on Machine Translation, vol.2, pp.389-399, 2017.

R. Sennrich, B. Haddow, and A. Birch, Improving Neural Machine Translation Models with Monolingual Data, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.86-96, 2016.

R. Sennrich, B. Haddow, and A. Birch, Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.1715-1725, 2016.

L. Samuel, P. Smith, C. Kindermans, Q. V. Ying, and . Le, Don't decay the learning rate, increase the batch size, Proceedings of the 6th International Conference on Learning Representations, ICLR'18, 2018.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is all you need, Advances in Neural Information Processing Systems, vol.30, pp.5998-6008, 2017.

Q. Wang, B. Li, J. Liu, B. Jiang, Z. Zhang et al., The NiuTrans Machine Translation System for WMT18, Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp.528-534, 2018.