Skip to Main content Skip to Navigation
New interface
Conference papers

Internet Documents: A Rich Source for Spoken Language Modeling

Abstract : Spoken language speech recognition systems need better understanding of natural spoken language phenomenon than their dictation counterparts. Current language models are mostly based on written text and/or very tedious Wizard of Oz or real dialog experiments1. In this paper we propose to use Internet documents as a very rich source of information for spoken language modeling. Through detailed experiments we show how using Internet we could automatically prepare language models adapted to a given task. For a given recognition system using this approach the word accuracy is up to 15% better than a system using language models trained on written text.
Document type :
Conference papers
Complete list of metadata

Cited literature [5 references]  Display  Hide  Download
Contributor : Dominique Vaufreydaz Connect in order to contact the contributor
Submitted on : Wednesday, October 1, 2008 - 10:18:27 PM
Last modification on : Wednesday, July 6, 2022 - 4:21:00 AM
Long-term archiving on: : Friday, June 4, 2010 - 12:05:05 PM


Files produced by the author(s)


  • HAL Id : inria-00326147, version 1



Dominique Vaufreydaz, Mohamad Akbar, José Rouillard. Internet Documents: A Rich Source for Spoken Language Modeling. IEEE Workshop ASRU'99 (Automatic Speech Recognition and Understanding), IEEE, Dec 1999, Keystone - Colorado, United States. pp. 277-281. ⟨inria-00326147⟩



Record views


Files downloads