Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources - Archive ouverte HAL Access content directly
Conference Papers Year : 2013

Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources

(1) , (1)
1

Abstract

We introduce a generic approach for transferring part-of-speech annotations from a resourced language to a non-resourced but etymologically close language. We first infer a bilingual lexicon between the two languages with methods based on character similarity, frequency similarity and context similarity. We then assign part-of-speech tags to these bilingual lexicon entries and annotate the remaining words on the basis of suffix analogy. We evaluate our approach on five language pairs of the Iberic peninsula, reaching up to 95% of precision on the lexicon induction task and up to 85% of tagging accuracy.
Fichier principal
Vignette du fichier
langvar13.pdf (161.95 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00862693 , version 1 (17-09-2013)

Identifiers

  • HAL Id : hal-00862693 , version 1

Cite

Yves Scherrer, Benoît Sagot. Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources. RANLP Workshop on Adaptation of language resources and tools for closely related languages and language variants, Sep 2013, Hissar, Bulgaria. ⟨hal-00862693⟩
520 View
194 Download

Share

Gmail Facebook Twitter LinkedIn More