3532 articles – 5253 Notices  [english version]

inria-00100021, version 1

Statistical Feature Language Model

Kamel Smaïli a1, Salma Jamoussi b1, David Langlois c1, Jean-Paul Haton () b1

8th International Conference on Spoken Language Processing - ICSLP' 2004 (2004) 4 p

Résumé : Statistical language models are widely used in automatic speech recognition in order to constrain the decoding of a sentence. Most of these models derive from the classical n-gram paradigm. However, the production of a word dends on a large set of linguistic features : lexical, syntactic, semantic, etc. Moreover, in some natural languages the gender and number of the left context affect the production of the next word. Therefore, it seems attractive to design a language model based on a variety of word features. We present in this paper a new statistical language model, called Statistical Feature Language Model, SFLM, based on this idea. In SFLM a word is considered as an array of linguistic features, and the model is defined in a way similar to the n-gram model. Experiments carried out for French and show an improvement in terms of perplexity and predicted words.

  • a –  UNIVERSITE NANCY 2
  • b –  UNIVERSITE HENRI POINCARE
  • c –  IUFM DE LORRAINE
  • 1 :  PAROLE (INRIA Lorraine - LORIA)
  • INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
  • Domaine : Informatique/Autre
  • Mots-clés : statistical language modeling – automatic speech recognition || modélisation statistique du langage – reconnaissance automatique de la parole
  • Référence interne : A04-R-253 || smaili04a
  • Commentaire : Colloque avec actes et comité de lecture. internationale.
 
  • inria-00100021, version 1
  • oai:hal.inria.fr:inria-00100021
  • Contributeur : 
  • Soumis le : Mardi 26 Septembre 2006, 10:13:24
  • Dernière modification le : Jeudi 28 Septembre 2006, 15:22:46