A New Method Based on Context for Combining Statistical Language Models

David Langlois 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper we propose a new method to extract from a corpus the histories for which a given language model is better than another one. The decision is based on a measure stemmed from perplexity. This measure allows, for a given history, to compare two language models, and then to choose the best one for this history. Using this principle, and with a 20K vocabulary words, we combined two language models: a bigram and a distant bigram. The contribution of a distant bigram is significant and outperforms a bigram model by 7.5%. Moreover, the performance in Shannon game are improved. We show through this article that we proposed a cheaper framework in comparison to the maximum entropy principle, for combining language models. In addition, the selected histories for which a model is better than another one, have been collected and studied. Almost, all of them are beginnings of very frequently used French phrases. Finally, by using this principle, we achieve a better trigram model in terms of parameters and perplexity. This model is a combination of a bigram and a trigram based on a selected history.
Type de document :
Communication dans un congrès
Varol Akman, Paolo Bouquet, Richmond Thomason, Roger A. Young. Third International Conference on Modeling and Using Context - CONTEXT 01, 2001, Dundee, Scotland, Springer, 2116, pp.235-247, 2001, Lecture Notes in Artificial Intelligence
Liste complète des métadonnées

https://hal.inria.fr/inria-00100420
Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 14:43:22
Dernière modification le : jeudi 11 janvier 2018 - 06:19:55

Identifiants

  • HAL Id : inria-00100420, version 1

Collections

Citation

David Langlois, Kamel Smaïli, Jean-Paul Haton. A New Method Based on Context for Combining Statistical Language Models. Varol Akman, Paolo Bouquet, Richmond Thomason, Roger A. Young. Third International Conference on Modeling and Using Context - CONTEXT 01, 2001, Dundee, Scotland, Springer, 2116, pp.235-247, 2001, Lecture Notes in Artificial Intelligence. 〈inria-00100420〉

Partager

Métriques

Consultations de la notice

150