Étiquetage morphosyntaxique de langues non dotées à partir de ressources pour une langue étymologiquement proche

Yves Scherrer 1 Benoît Sagot 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : We introduce a generic approach for transferring part-of-speech annotations from a resourced language to a non-resourced but etymologically close language. We do not rely on the existence of any parallel corpora or any linguistic knowledge for the non-resourced language (no lexicons, no annotated corpora). Our approach only makes use of cognate pairs that are automatically induced in an unsupervised way, based on character-based statistical machine translation and on a morphosyntactic lexicon for the resourced language. Frequent and short words are treated differently, as we tag them directly based on a cross-language similarity assessment of immediate morphosyntactic contexts. Using German as a resourced language, we evaluate our approach on Dutch --- in fact a resourced language --- and on Palatine German. We reach tagging accuracies of 67.2% on Dutch and 60.7% on Palatine German.
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download

https://hal.inria.fr/hal-00838569
Contributor : Yves Scherrer <>
Submitted on : Wednesday, June 26, 2013 - 9:15:16 AM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Long-term archiving on : Wednesday, April 5, 2017 - 4:31:21 AM

File

talare13.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00838569, version 1

Collections

Citation

Yves Scherrer, Benoît Sagot. Étiquetage morphosyntaxique de langues non dotées à partir de ressources pour une langue étymologiquement proche. Atelier TALARE, TALN 2013, ATALA, Jun 2013, Les Sables d'Olonne, France. ⟨hal-00838569⟩

Share

Metrics

Record views

399

Files downloads

326