Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

Résumé

In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are designed to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child speech and orthographic noise. Being able to increase the efficiency of language resource development for language documentation and acquisition research also constitutes a step towards solving the data sparsity issues with which researchers have been struggling.
Fichier principal
Vignette du fichier
speeding-corpus-development-10.pdf (108.32 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01570614 , version 1 (31-07-2017)

Identifiants

Citer

Géraldine Walther, Benoît Sagot. Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin. Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Aug 2017, Vancouver, Canada. pp.89 - 94, ⟨10.18653/v1/W17-2212⟩. ⟨hal-01570614⟩
184 Consultations
203 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More