Skip to Main content Skip to Navigation
Conference papers

Dealing with distant relationships in natural language modelling for automatic speech recognition

David Langlois 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Classical statistical language models, called n-gram models, describe natural language using the probabilistic relationship between a word to predict and the n-1 contiguous words preceding it. Obviously, the linguistic relationships present in a sentence are more complex. A first remark is that there exist distant relationships. We present here some recent work on an alternative model to n-gram models, based on the split of the history, dealing with the interpolation between distant bigram models. More precisely, our model is a cheaper alternative to high order n-grams. In conventional n-grams, when n is greater than 3, events are less frequent and statistics are not reliable. To deal with this problem, and to accurately estimate parameters, we combine a smoothed bigram with distant 3-bigram, distant 4-bigram and a cache composed of 100 words. We present new progresses obtained by using a simulated annealing algorithm in order to calculate the best parameters of this linear combination. With a 20K vocabulary and 40 million words for training, our algorithm improved the perplexity by 5.4% in comparison with the Baum-Welch algorithm. Moreover, this new model outperforms a smoothed bigram by 6.1% in terms of perplexity.
Document type :
Conference papers
Complete list of metadata

Cited literature [6 references]  Display  Hide  Download
Contributor : Publications Loria Connect in order to contact the contributor
Submitted on : Tuesday, September 26, 2006 - 8:46:29 AM
Last modification on : Wednesday, February 2, 2022 - 3:51:43 PM
Long-term archiving on: : Wednesday, March 29, 2017 - 12:42:16 PM


  • HAL Id : inria-00099031, version 1



David Langlois, Kamel Smaïli, Jean-Paul Haton. Dealing with distant relationships in natural language modelling for automatic speech recognition. 4th World Multiconference on Systemics, Cybernetics & Informatics - SCI'2000, International Institute of Informatics & Systemics, 2000, Orlando, USA, pp.400-405. ⟨inria-00099031⟩



Record views


Files downloads