Improving Part-of-Speech Tagging of Historical Text by First Translating to Modern Text

Abstract : We explore the task of automatically assigning syntactic tags (known as part-of-speech tags) like Noun and Verb to words in seventeenth-century Dutch text. Tools exist for performing this task for modern texts but they perform poorly on historical texts because of language changes. We test several methods for translating the words in the historical text to modern equivalents before applying the tag assignment tools. We show that this additional translation step improves the quality of the automatic syntactic analysis. Further improvements are possible when the lexicons and text collections used for developing the translation process, are extended in size.
Type de document :
Communication dans un congrès
2nd International Workshop on Computational History and Data-Driven Humanities (CHDDH), May 2016, Dublin, Ireland. IFIP Advances in Information and Communication Technology, AICT-482, pp.54-64, 2016, Computational History and Data-Driven Humanities. 〈10.1007/978-3-319-46224-0_6〉
Liste complète des métadonnées

Littérature citée [26 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01616302
Contributeur : Hal Ifip <>
Soumis le : vendredi 13 octobre 2017 - 14:52:45
Dernière modification le : vendredi 13 octobre 2017 - 14:54:54

Fichier

 Accès restreint
Fichier visible le : 2019-01-01

Connectez-vous pour demander l'accès au fichier

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Erik Sang. Improving Part-of-Speech Tagging of Historical Text by First Translating to Modern Text. 2nd International Workshop on Computational History and Data-Driven Humanities (CHDDH), May 2016, Dublin, Ireland. IFIP Advances in Information and Communication Technology, AICT-482, pp.54-64, 2016, Computational History and Data-Driven Humanities. 〈10.1007/978-3-319-46224-0_6〉. 〈hal-01616302〉

Partager

Métriques

Consultations de la notice

20