Similar N-Gram Language Model

Christian Gillot 1 Christophe Cerisara 1 David Langlois 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper describes an extension of the n-gram language model: the similar n-gram language model. The estimation of the probability P(s) of a string s by the classical model of order n is computed using statistics of occurrences of the last n words of the string in the corpus, whereas the proposed model further uses all the strings s' for which the Levenshtein distance to s is smaller than a given threshold. The similarity between s and each string s' is estimated using co-occurrence statistics. The new P(s) is approximated by smoothing all the similar n-gram probabilities with a regression technique. A slight but statistically significant decrease in the word error rate is obtained on a state-of-the-art automatic speech recognition system when the similar n-gram language model is interpolated linearly with the n-gram model.
Type de document :
Communication dans un congrès
INTERSPEECH 2010, Sep 2010, Tokyo, Japan. pp.1824-1827, 2010
Liste complète des métadonnées

https://hal.inria.fr/inria-00540428
Contributeur : Christophe Cerisara <>
Soumis le : vendredi 26 novembre 2010 - 16:37:45
Dernière modification le : vendredi 9 février 2018 - 13:20:01

Identifiants

  • HAL Id : inria-00540428, version 1

Collections

Citation

Christian Gillot, Christophe Cerisara, David Langlois, Jean-Paul Haton. Similar N-Gram Language Model. INTERSPEECH 2010, Sep 2010, Tokyo, Japan. pp.1824-1827, 2010. 〈inria-00540428〉

Partager

Métriques

Consultations de la notice

266