Improving Statistical Language Models by Removing Impossible Events

Armelle Brun 1 David Langlois 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper deals with a new method which detects impossible bigrams in a space of $V^2$ bigrams designed from a vocabulary V and which discards them from a statistical language model. We claim that discarding ungrammatical events, which are impossible in a well written text, will improve language models and as well, reduce the complexity of search algorithms in speech recognition. The Purged Language Model (PLM) needs a set of impossible bigrams, which are detected by using automatic rules based on a class model, phonology rules, etc. Methods for redistributing the sum of probabilities issued from impossible bigrams among possible events have been developped. This idea allows us to take advantage of natural language constraints and to include linguistic criteria in statistical language models. The PLM has been tested on a test corpus of 2M words and achieves a perplexity improvement of 51% under certain conditions.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/inria-00100651
Contributor : Publications Loria <>
Submitted on : Tuesday, September 26, 2006 - 2:48:39 PM
Last modification on : Thursday, January 11, 2018 - 6:19:55 AM

Identifiers

  • HAL Id : inria-00100651, version 1

Collections

Citation

Armelle Brun, David Langlois, Kamel Smaïli, Jean-Paul Haton. Improving Statistical Language Models by Removing Impossible Events. Proceedings of the International Workshop "Speech and Computer" - SPECOM 2001, 2001, Moscow, Russia, 4 p. ⟨inria-00100651⟩

Share

Metrics

Record views

205