Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

Géraldine Walther; Benoît Sagot

doi:10.18653/v1/W17-2212

Communication Dans Un Congrès Année : 2017

Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

(1) , (2)

1
2

Géraldine Walther

Fonction : Auteur
PersonId : 1013951

Institut für Vergleichende Sprachwissenschaft

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Automatic Language Modelling and ANAlysis & Computational Humanities

Résumé

In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are designed to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child speech and orthographic noise. Being able to increase the efficiency of language resource development for language documentation and acquisition research also constitutes a step towards solving the data sparsity issues with which researchers have been struggling.

Mots clés

Language documentation methodology Corpus annotation and tagging Romansh Tuatschin Natural Language Processing

Domaines

Linguistique Informatique et langage [cs.CL]

Fichier principal

speeding-corpus-development-10.pdf (108.32 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Benoît Sagot : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01570614

Soumis le : lundi 31 juillet 2017-19:08:29

Dernière modification le : mardi 3 octobre 2023-17:18:04

Dates et versions

hal-01570614 , version 1 (31-07-2017)

Identifiants

HAL Id : hal-01570614 , version 1
DOI : 10.18653/v1/W17-2212

Citer

Géraldine Walther, Benoît Sagot. Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin. Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Aug 2017, Vancouver, Canada. pp.89 - 94, ⟨10.18653/v1/W17-2212⟩. ⟨hal-01570614⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2 PARTHENOS

184 Consultations

203 Téléchargements

Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager