Internet Documents: A Rich Source for Spoken Language Modeling - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 1999

Internet Documents: A Rich Source for Spoken Language Modeling

Résumé

Spoken language speech recognition systems need better understanding of natural spoken language phenomenon than their dictation counterparts. Current language models are mostly based on written text and/or very tedious Wizard of Oz or real dialog experiments1. In this paper we propose to use Internet documents as a very rich source of information for spoken language modeling. Through detailed experiments we show how using Internet we could automatically prepare language models adapted to a given task. For a given recognition system using this approach the word accuracy is up to 15% better than a system using language models trained on written text.
Fichier principal
Vignette du fichier
Vaufreydaz99c.pdf (80.69 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00326147 , version 1 (01-10-2008)

Identifiants

  • HAL Id : inria-00326147 , version 1

Citer

Dominique Vaufreydaz, Mohamad Akbar, José Rouillard. Internet Documents: A Rich Source for Spoken Language Modeling. IEEE Workshop ASRU'99 (Automatic Speech Recognition and Understanding), IEEE, Dec 1999, Keystone - Colorado, United States. pp. 277-281. ⟨inria-00326147⟩
147 Consultations
179 Téléchargements

Partager

Gmail Facebook X LinkedIn More