How to handle gender and number agreement in statistical language models?

Caroline Lavecchia 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : The agreement in gender and number is a critical problem in statistical language modeling. One of the main difficulties in speech recognition of French language is the presence of misrecognized words due to the bad agreement (in gender and number) between words. Statistical language models do not treat this phenomena directly. This paper focuses on how to handle the issue of this agreement. We introduce an original model called Features-Cache (FC) to estimate the gender and the number of the word to predict. It is a dynamic variable-length Features-Cache. The size of the cache is automatically determined in accordance to syntagm delimitors. The main advantage of this model is that there is no need to any syntactic parsing: it is used as any other statistical language model. Several models have been carried out and the best one achieves an improvement of approximatively 9 points in terms of perplexity. This model has been integrated in a speech recognition system based on JULIUS engine. Tests have been carried out on 280 sentences provided by AUPELF for the French automatic speech recognition evaluation campaign. This new model outperforms the baseline one, in terms of word error, by 3%.
Type de document :
Communication dans un congrès
Ninth International Conference on Spoken Language Processing - INTERSPEECH 2006, Sep 2006, Pittsburgh, Pennsylvania/USA, 2006
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00103497
Contributeur : Caroline Lavecchia <>
Soumis le : mercredi 4 octobre 2006 - 15:23:42
Dernière modification le : jeudi 11 janvier 2018 - 06:19:55
Document(s) archivé(s) le : mardi 6 avril 2010 - 18:09:59

Fichier

Identifiants

  • HAL Id : inria-00103497, version 1

Collections

Citation

Caroline Lavecchia, Kamel Smaïli, Jean-Paul Haton. How to handle gender and number agreement in statistical language models?. Ninth International Conference on Spoken Language Processing - INTERSPEECH 2006, Sep 2006, Pittsburgh, Pennsylvania/USA, 2006. 〈inria-00103497〉

Partager

Métriques

Consultations de la notice

223

Téléchargements de fichiers

120