Testing the Robustness of Online Word Segmentation: Effects of Linguistic Diversity and Phonetic Variation

Abstract : Models of the acquisition of word segmentation are typically evaluated using phonemically transcribed corpora. Accordingly, they implicitly assume that children know how to undo phonetic variation when they learn to extract words from speech. Moreover, whereas models of language acquisition should perform similarly across languages, evaluation is often limited to English samples. Using child-directed corpora of English, French and Japanese, we evaluate the performance of state-of-the-art statistical models given inputs where phonetic variation has not been reduced. To do so, we measure segmentation robustness across different levels of segmental variation, simulating systematic allophonic variation or errors in phoneme recognition. We show that these models do not resist an increase in such variations and do not generalize to typologically different languages. From the perspective of early language acquisition, the results strengthen the hypothesis according to which phonological knowledge is acquired in large part before the construction of a lexicon.
Type de document :
Communication dans un congrès
Keller, Frank and Reitter, David. CMCL 2011 - Cognitive Modeling and Computational Linguistics Workshop at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Jun 2011, Portland, United States. pp.1-9, 2011, Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 〈http://aclweb.org/anthology-new/W/W11/W11-06.pdf〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00605806
Contributeur : Luc Boruta <>
Soumis le : mercredi 29 août 2012 - 12:11:09
Dernière modification le : vendredi 25 mai 2018 - 12:02:05
Document(s) archivé(s) le : vendredi 30 novembre 2012 - 02:25:08

Identifiants

  • HAL Id : inria-00605806, version 1

Collections

Citation

Luc Boruta, Sharon Peperkamp, Benoît Crabbé, Emmanuel Dupoux. Testing the Robustness of Online Word Segmentation: Effects of Linguistic Diversity and Phonetic Variation. Keller, Frank and Reitter, David. CMCL 2011 - Cognitive Modeling and Computational Linguistics Workshop at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Jun 2011, Portland, United States. pp.1-9, 2011, Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics at the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 〈http://aclweb.org/anthology-new/W/W11/W11-06.pdf〉. 〈inria-00605806〉

Partager

Métriques

Consultations de la notice

255

Téléchargements de fichiers

201