Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources

Résumé

We introduce a generic approach for transferring part-of-speech annotations from a resourced language to a non-resourced but etymologically close language. We first infer a bilingual lexicon between the two languages with methods based on character similarity, frequency similarity and context similarity. We then assign part-of-speech tags to these bilingual lexicon entries and annotate the remaining words on the basis of suffix analogy. We evaluate our approach on five language pairs of the Iberic peninsula, reaching up to 95% of precision on the lexicon induction task and up to 85% of tagging accuracy.
Fichier principal
Vignette du fichier
langvar13.pdf (161.95 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00862693 , version 1 (17-09-2013)

Identifiants

  • HAL Id : hal-00862693 , version 1

Citer

Yves Scherrer, Benoît Sagot. Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources. RANLP Workshop on Adaptation of language resources and tools for closely related languages and language variants, Sep 2013, Hissar, Bulgaria. ⟨hal-00862693⟩
531 Consultations
207 Téléchargements

Partager

Gmail Facebook X LinkedIn More