Skip to Main content Skip to Navigation
Conference papers

A New Method Based on Context for Combining Statistical Language Models

David Langlois 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper we propose a new method to extract from a corpus the histories for which a given language model is better than another one. The decision is based on a measure stemmed from perplexity. This measure allows, for a given history, to compare two language models, and then to choose the best one for this history. Using this principle, and with a 20K vocabulary words, we combined two language models: a bigram and a distant bigram. The contribution of a distant bigram is significant and outperforms a bigram model by 7.5%. Moreover, the performance in Shannon game are improved. We show through this article that we proposed a cheaper framework in comparison to the maximum entropy principle, for combining language models. In addition, the selected histories for which a model is better than another one, have been collected and studied. Almost, all of them are beginnings of very frequently used French phrases. Finally, by using this principle, we achieve a better trigram model in terms of parameters and perplexity. This model is a combination of a bigram and a trigram based on a selected history.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/inria-00100420
Contributor : Publications Loria <>
Submitted on : Tuesday, September 26, 2006 - 2:43:22 PM
Last modification on : Thursday, March 5, 2020 - 4:56:41 PM

Identifiers

  • HAL Id : inria-00100420, version 1

Collections

Citation

David Langlois, Kamel Smaïli, Jean-Paul Haton. A New Method Based on Context for Combining Statistical Language Models. Third International Conference on Modeling and Using Context - CONTEXT 01, 2001, Dundee, Scotland, pp.235-247. ⟨inria-00100420⟩

Share

Metrics

Record views

195