A New Method Based on Context for Combining Statistical Language Models - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2001

A New Method Based on Context for Combining Statistical Language Models

Résumé

In this paper we propose a new method to extract from a corpus the histories for which a given language model is better than another one. The decision is based on a measure stemmed from perplexity. This measure allows, for a given history, to compare two language models, and then to choose the best one for this history. Using this principle, and with a 20K vocabulary words, we combined two language models: a bigram and a distant bigram. The contribution of a distant bigram is significant and outperforms a bigram model by 7.5%. Moreover, the performance in Shannon game are improved. We show through this article that we proposed a cheaper framework in comparison to the maximum entropy principle, for combining language models. In addition, the selected histories for which a model is better than another one, have been collected and studied. Almost, all of them are beginnings of very frequently used French phrases. Finally, by using this principle, we achieve a better trigram model in terms of parameters and perplexity. This model is a combination of a bigram and a trigram based on a selected history.
Fichier non déposé

Dates et versions

inria-00100420 , version 1 (26-09-2006)

Identifiants

  • HAL Id : inria-00100420 , version 1

Citer

David Langlois, Kamel Smaïli, Jean-Paul Haton. A New Method Based on Context for Combining Statistical Language Models. Third International Conference on Modeling and Using Context - CONTEXT 01, 2001, Dundee, Scotland, pp.235-247. ⟨inria-00100420⟩
45 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More