Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources

Yves Scherrer 1 Benoît Sagot 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : We introduce a generic approach for transferring part-of-speech annotations from a resourced language to a non-resourced but etymologically close language. We first infer a bilingual lexicon between the two languages with methods based on character similarity, frequency similarity and context similarity. We then assign part-of-speech tags to these bilingual lexicon entries and annotate the remaining words on the basis of suffix analogy. We evaluate our approach on five language pairs of the Iberic peninsula, reaching up to 95% of precision on the lexicon induction task and up to 85% of tagging accuracy.
Type de document :
Communication dans un congrès
RANLP Workshop on Adaptation of language resources and tools for closely related languages and language variants, Sep 2013, Hissar, Bulgaria. 2013
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00862693
Contributeur : Yves Scherrer <>
Soumis le : mardi 17 septembre 2013 - 13:57:48
Dernière modification le : mardi 11 octobre 2016 - 13:51:01
Document(s) archivé(s) le : jeudi 6 avril 2017 - 21:18:15

Fichier

langvar13.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00862693, version 1

Collections

Citation

Yves Scherrer, Benoît Sagot. Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources. RANLP Workshop on Adaptation of language resources and tools for closely related languages and language variants, Sep 2013, Hissar, Bulgaria. 2013. 〈hal-00862693〉

Partager

Métriques

Consultations de la notice

513

Téléchargements de fichiers

195