A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages

Résumé

In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i.e., parallel) data. We first induce a translation lexicon from monolingual corpora, based on cognate detection followed by cross-lingual contextual similarity. Second, POS information is transferred from the resourced language along translation pairs to the non-resourced language and used for tagging the corpus. We evaluate our methods on three language families, consisting of five Romance languages, three Germanic languages and five Slavic languages. We obtain tagging accuracies of up to 91.6%.
Fichier principal
Vignette du fichier
lrec14cll.pdf (88.5 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01022298 , version 1 (10-07-2014)

Identifiants

  • HAL Id : hal-01022298 , version 1

Citer

Yves Scherrer, Benoît Sagot. A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages. Language Resources and Evaluation Conference, European Language Resources Association, May 2014, Reykjavik, Iceland. ⟨hal-01022298⟩
401 Consultations
340 Téléchargements

Partager

Gmail Facebook X LinkedIn More