D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

M. Ballesteros, C. Dyer, and N. A. Smith, Improved transition-based parsing by modeling characters instead of words with lstms, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.349-359, 2015.

P. Bojanowski, É. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, TACL, vol.5, pp.135-146, 2017.

J. Buckman and G. Neubig, Neural lattice language models, 2018.

Y. Chu and T. Liu, On the shortest arborescence of a directed graph, Science Sinica, pp.1396-1400, 1967.

É. Vilemonte-de-la-clergerie, B. Sagot, and D. Seddah, The ParisNLP entry at the CoNLL UD Shared Task 2017: A Tale of a # ParsingTragedy, Conference on Computational Natural Language Learning, pp.243-252, 2017.

C. Nogueira, B. Santos, and . Zadrozny, Learning character-level representations for part-of-speech tagging, Proceedings of the 31th International Conference on Machine Learning, pp.1818-1826, 2014.

T. Dozat and C. D. Manning, Deep Biaffine Attention for Neural Dependency Parsing, 2016.

T. Dozat, P. Qi, and C. Manning, Stanford's Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task, Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp.20-30, 2017.

M. Fares, S. Oepen, L. Øvrelid, J. Björne, and R. Johansson, On the downstream utility of English Universal Dependency parsers, Proceedings of the 22nd Conference on Natural Language Learning, 2018.

M. Faruqui, J. Dodge, S. Kumar-jauhar, C. Dyer, E. H. Hovy et al., Retrofitting word vectors to semantic lexicons, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.1606-1615, 2015.

E. Kiperwasser and Y. Goldberg, Simple and accurate dependency parsing using bidirectional LSTM feature representations, Transactions of the Association for Computational Linguistics (TACL), vol.4, pp.313-327, 2016.

X. Ma, Z. Hu, J. Liu, N. Peng, G. Neubig et al., Stackpointer networks for dependency parsing, 2018.

A. Matthews, G. Neubig, and C. Dyer, Using Morphological Knowledge in OpenVocabulary Neural Language Models, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACLHLT 2018, pp.1435-1445, 2018.

T. Mikolov, V. Quoc, I. Le, and . Sutskever, 1.42) 77.49 (-3.80) 72.29 (-7.26) cu_proiel 70.54 (-5.19) 57.01 (-6.30) 63.56 (-7.75) la_ittb 84.64 (-2.44) 74.66 (-5.18) 81.33 (-3.04) da_ddt 81.22 (-5.06) 70.47 (-6.84) 71.74 (-6.33) la_perseus 55.02 (-17.61) 32.08 (-17.69) 37.15 (-15.60) de_gsd 77.64 (-2.72) 36.95 (-21.09) 67.94 (-3.46) la_proiel 67.90 (-5.71) 53.02 (-6.34) 61.85 (-5.75) el_gdt 86, p.83, 2013.

L. Mlas-blex, 39) Big treebanks only 80, vol.29

, Low-resource languages only 16, vol.52

, Results on each treebank in the shared task along with the macro average over all of them, vol.7