External Lexical Information for Multilingual Part-of-Speech Tagging

Benoît Sagot 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : Morphosyntactic lexicons and word vector representations have both proven useful for improving the accuracy of statistical part-of-speech taggers. Here we compare the performances of four systems on datasets covering 16 languages, two of these systems being feature-based (MEMMs and CRFs) and two of them being neural-based (bi-LSTMs). We show that, on average, all four approaches perform similarly and reach state-of-the-art results. Yet better performances are obtained with our feature-based models on lexically richer datasets (e.g. for morphologically rich languages), whereas neural-based results are higher on datasets with less lexical variability (e.g. for English). These conclusions hold in particular for the MEMM models relying on our system MElt, which benefited from newly designed features. This shows that, under certain conditions, feature-based approaches enriched with morphosyntactic lexicons are competitive with respect to neural methods.
Type de document :
Rapport
[Research Report] RR-8924, Inria Paris. 2016
Liste complète des métadonnées

https://hal.inria.fr/hal-01330301
Contributeur : Benoît Sagot <>
Soumis le : samedi 6 août 2016 - 15:54:53
Dernière modification le : samedi 9 juin 2018 - 10:30:06

Fichiers

RR-8924.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01330301, version 3
  • ARXIV : 1606.03676

Collections

Citation

Benoît Sagot. External Lexical Information for Multilingual Part-of-Speech Tagging. [Research Report] RR-8924, Inria Paris. 2016. 〈hal-01330301v3〉

Partager

Métriques

Consultations de la notice

209

Téléchargements de fichiers

147