Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process

Abstract : We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project.
Type de document :
Communication dans un congrès
LREC - 9th Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. 2014
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00979026
Contributeur : Camille Fauth <>
Soumis le : mardi 15 avril 2014 - 10:59:32
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24
Document(s) archivé(s) le : mardi 15 juillet 2014 - 10:56:11

Fichier

LREC_IFCASL_long.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00979026, version 1

Collections

Citation

Camille Fauth, Anne Bonneau, Frank Zimmerer, Jürgen Trouvain, Bistra Andreeva, et al.. Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process. LREC - 9th Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. 2014. 〈hal-00979026〉

Partager

Métriques

Consultations de la notice

672

Téléchargements de fichiers

413