Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources

Yves Scherrer 1 Benoît Sagot 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : We introduce a generic approach for transferring part-of-speech annotations from a resourced language to a non-resourced but etymologically close language. We first infer a bilingual lexicon between the two languages with methods based on character similarity, frequency similarity and context similarity. We then assign part-of-speech tags to these bilingual lexicon entries and annotate the remaining words on the basis of suffix analogy. We evaluate our approach on five language pairs of the Iberic peninsula, reaching up to 95% of precision on the lexicon induction task and up to 85% of tagging accuracy.
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.inria.fr/hal-00862693
Contributor : Yves Scherrer <>
Submitted on : Tuesday, September 17, 2013 - 1:57:48 PM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Long-term archiving on : Thursday, April 6, 2017 - 9:18:15 PM

File

langvar13.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00862693, version 1

Collections

Citation

Yves Scherrer, Benoît Sagot. Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources. RANLP Workshop on Adaptation of language resources and tools for closely related languages and language variants, Sep 2013, Hissar, Bulgaria. ⟨hal-00862693⟩

Share

Metrics

Record views

666

Files downloads

316