External Lexical Information for Multilingual Part-of-Speech Tagging

Benoît Sagot

Rapport (Rapport De Recherche) Année : 2016

External Lexical Information for Multilingual Part-of-Speech Tagging

(1)

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing

Résumé

Morphosyntactic lexicons and word vector representations have both proven useful for improving the accuracy of statistical part-of-speech taggers. Here we compare the performances of four systems on datasets covering 16 languages, two of these systems being feature-based (MEMMs and CRFs) and two of them being neural-based (bi-LSTMs). We show that, on average, all four approaches perform similarly and reach state-of-the-art results. Yet better performances are obtained with our feature-based models on lexically richer datasets (e.g. for morphologically rich languages), whereas neural-based results are higher on datasets with less lexical variability (e.g. for English). These conclusions hold in particular for the MEMM models relying on our system MElt, which benefited from newly designed features. This shows that, under certain conditions, feature-based approaches enriched with morphosyntactic lexicons are competitive with respect to neural methods.

Mots clés

Feature-based models Part-of-Speech Tagging Neural models MEMM CRF bi-LSTM Multilingual Analysis

Domaines

Informatique et langage [cs.CL]

Fichier principal

RR-8924.pdf (992.99 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Benoît Sagot : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01330301

Soumis le : samedi 6 août 2016-15:54:53

Dernière modification le : mercredi 26 octobre 2022-17:38:14

Dates et versions

hal-01330301 , version 1 (10-06-2016)

hal-01330301 , version 2 (10-06-2016)

hal-01330301 , version 3 (06-08-2016)

Identifiants

HAL Id : hal-01330301 , version 3
ARXIV : 1606.03676

Citer

Benoît Sagot. External Lexical Information for Multilingual Part-of-Speech Tagging. [Research Report] RR-8924, Inria Paris. 2016. ⟨hal-01330301v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS7 INRIA INRIA-RRRT INRIA2 CAMPUS-AAR AAI LARA USPC

213 Consultations

330 Téléchargements

External Lexical Information for Multilingual Part-of-Speech Tagging

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager