Internet Documents: A Rich Source for Spoken Language Modeling

Abstract : Spoken language speech recognition systems need better understanding of natural spoken language phenomenon than their dictation counterparts. Current language models are mostly based on written text and/or very tedious Wizard of Oz or real dialog experiments1. In this paper we propose to use Internet documents as a very rich source of information for spoken language modeling. Through detailed experiments we show how using Internet we could automatically prepare language models adapted to a given task. For a given recognition system using this approach the word accuracy is up to 15% better than a system using language models trained on written text.
Type de document :
Communication dans un congrès
IEEE Workshop ASRU'99 (Automatic Speech Recognition and Understanding), Dec 1999, Keystone - Colorado, United States. pp. 277-281, 1999
Liste complète des métadonnées

Littérature citée [5 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00326147
Contributeur : Dominique Vaufreydaz <>
Soumis le : mercredi 1 octobre 2008 - 22:18:27
Dernière modification le : mardi 20 février 2018 - 15:10:03
Document(s) archivé(s) le : vendredi 4 juin 2010 - 12:05:05

Fichier

Vaufreydaz99c.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00326147, version 1

Citation

Dominique Vaufreydaz, Mohamad Akbar, José Rouillard. Internet Documents: A Rich Source for Spoken Language Modeling. IEEE Workshop ASRU'99 (Automatic Speech Recognition and Understanding), Dec 1999, Keystone - Colorado, United States. pp. 277-281, 1999. 〈inria-00326147〉

Partager

Métriques

Consultations de la notice

144

Téléchargements de fichiers

139