Skip to Main content Skip to Navigation
Conference papers

How to handle gender and number agreement in statistical language models?

Caroline Lavecchia 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : The agreement in gender and number is a critical problem in statistical language modeling. One of the main difficulties in speech recognition of French language is the presence of misrecognized words due to the bad agreement (in gender and number) between words. Statistical language models do not treat this phenomena directly. This paper focuses on how to handle the issue of this agreement. We introduce an original model called Features-Cache (FC) to estimate the gender and the number of the word to predict. It is a dynamic variable-length Features-Cache. The size of the cache is automatically determined in accordance to syntagm delimitors. The main advantage of this model is that there is no need to any syntactic parsing: it is used as any other statistical language model. Several models have been carried out and the best one achieves an improvement of approximatively 9 points in terms of perplexity. This model has been integrated in a speech recognition system based on JULIUS engine. Tests have been carried out on 280 sentences provided by AUPELF for the French automatic speech recognition evaluation campaign. This new model outperforms the baseline one, in terms of word error, by 3%.
Document type :
Conference papers
Complete list of metadata

Cited literature [10 references]  Display  Hide  Download
Contributor : Caroline Lavecchia Connect in order to contact the contributor
Submitted on : Wednesday, October 4, 2006 - 3:23:42 PM
Last modification on : Friday, February 26, 2021 - 3:28:05 PM
Long-term archiving on: : Tuesday, April 6, 2010 - 6:09:59 PM


  • HAL Id : inria-00103497, version 1



Caroline Lavecchia, Kamel Smaïli, Jean-Paul Haton. How to handle gender and number agreement in statistical language models?. Ninth International Conference on Spoken Language Processing - INTERSPEECH 2006, Sep 2006, Pittsburgh, Pennsylvania/USA. ⟨inria-00103497⟩



Record views


Files downloads