# Improving Statistical Language Models by Removing Impossible Events

1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper deals with a new method which detects impossible bigrams in a space of $V^2$ bigrams designed from a vocabulary V and which discards them from a statistical language model. We claim that discarding ungrammatical events, which are impossible in a well written text, will improve language models and as well, reduce the complexity of search algorithms in speech recognition. The Purged Language Model (PLM) needs a set of impossible bigrams, which are detected by using automatic rules based on a class model, phonology rules, etc. Methods for redistributing the sum of probabilities issued from impossible bigrams among possible events have been developped. This idea allows us to take advantage of natural language constraints and to include linguistic criteria in statistical language models. The PLM has been tested on a test corpus of 2M words and achieves a perplexity improvement of 51% under certain conditions.
Mots-clés :
Type de document :
Communication dans un congrès
Proceedings of the International Workshop "Speech and Computer" - SPECOM 2001, 2001, Moscow, Russia, 4 p, 2001
Domaine :
Liste complète des métadonnées

https://hal.inria.fr/inria-00100651
Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 14:48:39
Dernière modification le : jeudi 11 janvier 2018 - 06:19:55

### Identifiants

• HAL Id : inria-00100651, version 1

### Citation

Armelle Brun, David Langlois, Kamel Smaïli, Jean-Paul Haton. Improving Statistical Language Models by Removing Impossible Events. Proceedings of the International Workshop "Speech and Computer" - SPECOM 2001, 2001, Moscow, Russia, 4 p, 2001. 〈inria-00100651〉

### Métriques

Consultations de la notice