Skip to Main content Skip to Navigation
Conference papers

Improving language models by using distant information

Armelle Brun 1 David Langlois 1 Kamel Smaïli 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This study examines how to take originally advantage from distant information instatistical language models. We show that it is possible to use n-gram models considering histories different from those used during training. These models are called crossing context models. Our study deals with classical and distant n-gram models. A mixture of four models is proposed and evaluated. A bigram linear mixture achieves an improvement of 14% in terms of perplexity. Moreover the trigram mixture outperforms the standard trigram by 5.6%. These improvements have been obtained without complexifying standard n-gram models. The resulting mixture language model has been integrated into a speech recognition system. Its evaluation achieves a slight improvement in terms of word error rate on the data used for the francophone evaluation campaign ESTER. Finally, the impact of the proposed crossing context language models on performance is presented according to various speakers.
Document type :
Conference papers
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Armelle Brun Connect in order to contact the contributor
Submitted on : Tuesday, November 13, 2007 - 3:34:25 PM
Last modification on : Friday, February 26, 2021 - 3:28:06 PM
Long-term archiving on: : Monday, April 12, 2010 - 2:04:36 AM


Files produced by the author(s)


  • HAL Id : inria-00187084, version 1



Armelle Brun, David Langlois, Kamel Smaïli. Improving language models by using distant information. International Symposium on Signal Processing and its Applications - ISSPA 2007, Feb 2007, Sharjah, United Arab Emirates. ⟨inria-00187084⟩



Record views


Files downloads