A Phonemic Corpus of Polish Child-Directed Speech - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

A Phonemic Corpus of Polish Child-Directed Speech

Résumé

Recent advances in modeling early language acquisition are due not only to the development of machine-learning techniques, but also to the increasing availability of data on child language and child-adult interaction. In the absence of recordings of child-directed speech, or when models explicitly require such a representation for training data, phonemic transcriptions are commonly used as input data. We present a novel (and to our knowledge, the first) phonemic corpus of Polish child-directed speech. It is derived from the Weist corpus of Polish, freely available from the seminal CHILDES database. For the sake of reproducibility, and to exemplify the typical trade-off between ecological validity and sample size, we report all preprocessing operations and transcription guidelines. Contributed linguistic resources include updated CHAT-formatted transcripts with phonemic transcriptions in a novel phonology tier, as well as by-product data, such as a phonemic lexicon of Polish. All resources are distributed under the LGPL-LR license.

Domaines

Linguistique
Fichier principal
Vignette du fichier
final.pdf (127.17 Ko) Télécharger le fichier
LREC_12.pdf (120.81 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Format : Autre

Dates et versions

hal-00702437 , version 1 (30-05-2012)

Identifiants

  • HAL Id : hal-00702437 , version 1

Citer

Luc Boruta, Justyna Jastrzebska,. A Phonemic Corpus of Polish Child-Directed Speech. LREC 2012 - Eighth International Conference on Language Resources and Evaluation, May 2012, Istanbul, Turkey. ⟨hal-00702437⟩
187 Consultations
253 Téléchargements

Partager

Gmail Facebook X LinkedIn More