Polyglot: Distributed word representations for multilingual NLP, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp.183-192, 2013. ,
Clozedriven pretraining of self-attention networks, 2019. ,
Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics, vol.5, pp.135-146, 2017. ,
Findings of the 2018 conference on machine translation (WMT18), Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp.272-303, 2018. ,
Clueweb09 data set, 2009. ,
Transformer-xl: Attentive language models beyond a fixed, 2019. ,
, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv e-prints, 2018.
Learning word vectors for 157 languages, Proceedings of the 11th Language Resources and Evaluation Conference, 2018. ,
Fasttext.zip: Compressing text classification models, 2016. ,
Bag of tricks for efficient text classification, Proceedings of the 15th Conference of the European Chapter, vol.2, pp.427-431, 2017. ,
Advances in pre-training distributed word representations, Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC, 2018. ,
Distributed representations of words and phrases and their compositionality, Proceedings of the 26th International Conference on Neural Information Processing Systems, vol.2, pp.3111-3119, 2013. ,
English gigaword fifth edition, linguistic data consortium, 2011. ,
Glove: Global vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1532-1543, 2014. ,
Deep contextualized word representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.2227-2237, 2018. ,
Improving language understanding by generative pre-training, 2018. ,
Language models are unsupervised multitask learners, OpenAI Blog, vol.1, p.8, 2019. ,
Attention is all you need, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp.6000-6010, 2017. ,
Xlnet: Generalized autoregressive pretraining for language understanding, 2019. ,
Aligning books and movies: Towards story-like visual explanations by watching movies and reading books, 2015 IEEE International Conference on Computer Vision, ICCV 2015, pp.19-27, 2015. ,