Linguistic features modeling based on Partial New Cache

Kamel Smaïli 1 Caroline Lavecchia 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : The agreement in gender and number is a critical problem in statistical language modeling. One of the main problems in the speech recognition of French language is the presence of misrecognized words due to the bad agreement (in gender and number) between words. Statistical language models do not treat this phenomena directly. This paper focuses on how to handle the issue of agreements. We introduce an original model called Features-Cache (FC) to estimate the gender and the number of the word to predict. It is a dynamic variable-length Features-Cache for which the size is determined in accordance to syntagm delimitors. This model does not need any syntactic parsing, it is used as any other statistical language model. Several models have been carried out and the best one achieves an improvement of more than 8 points in terms of perplexity.
Document type :
Conference papers
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal.inria.fr/inria-00077321
Contributor : Caroline Lavecchia <>
Submitted on : Tuesday, May 30, 2006 - 4:04:57 PM
Last modification on : Thursday, January 11, 2018 - 6:19:56 AM
Long-term archiving on: Monday, April 5, 2010 - 9:48:47 PM

Identifiers

  • HAL Id : inria-00077321, version 1

Collections

Citation

Kamel Smaïli, Caroline Lavecchia, Jean-Paul Haton. Linguistic features modeling based on Partial New Cache. International Conference on Language Resources and Evaluation - LREC 2006, May 2006, Magazzini del Cotone Conference Center, Genoa/ITALY. ⟨inria-00077321⟩

Share

Metrics

Record views

398

Files downloads

135