Skip to Main content Skip to Navigation
Conference papers

Similar N-Gram Language Model

Christian Gillot 1 Christophe Cerisara 1 David Langlois 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper describes an extension of the n-gram language model: the similar n-gram language model. The estimation of the probability P(s) of a string s by the classical model of order n is computed using statistics of occurrences of the last n words of the string in the corpus, whereas the proposed model further uses all the strings s' for which the Levenshtein distance to s is smaller than a given threshold. The similarity between s and each string s' is estimated using co-occurrence statistics. The new P(s) is approximated by smoothing all the similar n-gram probabilities with a regression technique. A slight but statistically significant decrease in the word error rate is obtained on a state-of-the-art automatic speech recognition system when the similar n-gram language model is interpolated linearly with the n-gram model.
Document type :
Conference papers
Complete list of metadata
Contributor : Christophe Cerisara Connect in order to contact the contributor
Submitted on : Friday, November 26, 2010 - 4:37:45 PM
Last modification on : Friday, February 26, 2021 - 3:28:06 PM


  • HAL Id : inria-00540428, version 1



Christian Gillot, Christophe Cerisara, David Langlois, Jean-Paul Haton. Similar N-Gram Language Model. INTERSPEECH 2010, Sep 2010, Tokyo, Japan. pp.1824-1827. ⟨inria-00540428⟩



Record views