Linguistic features modeling based on Partial New Cache

Kamel Smaïli; Caroline Lavecchia; Jean-Paul Haton

Communication Dans Un Congrès Année : 2006

Linguistic features modeling based on Partial New Cache

(1) , (1) , (1)

Kamel Smaïli

Fonction : Auteur
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Analysis, perception and recognition of speech

Caroline Lavecchia

Fonction : Auteur

Analysis, perception and recognition of speech

Jean-Paul Haton

Fonction : Auteur
PersonId : 830987

Analysis, perception and recognition of speech

Résumé

The agreement in gender and number is a critical problem in statistical language modeling. One of the main problems in the speech recognition of French language is the presence of misrecognized words due to the bad agreement (in gender and number) between words. Statistical language models do not treat this phenomena directly. This paper focuses on how to handle the issue of agreements. We introduce an original model called Features-Cache (FC) to estimate the gender and the number of the word to predict. It is a dynamic variable-length Features-Cache for which the size is determined in accordance to syntagm delimitors. This model does not need any syntactic parsing, it is used as any other statistical language model. Several models have been carried out and the best one achieves an improvement of more than 8 points in terms of perplexity.

Domaines

Informatique et langage [cs.CL]

Fichier principal

LREC2006.pdf (68.11 Ko)

Caroline Lavecchia : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00077321

Soumis le : mardi 30 mai 2006-16:04:57

Dernière modification le : vendredi 24 mars 2023-14:52:47

Archivage à long terme le : lundi 5 avril 2010-21:48:47

Dates et versions

inria-00077321 , version 1 (30-05-2006)

Identifiants

HAL Id : inria-00077321 , version 1

Citer

Kamel Smaïli, Caroline Lavecchia, Jean-Paul Haton. Linguistic features modeling based on Partial New Cache. International Conference on Language Resources and Evaluation - LREC 2006, May 2006, Magazzini del Cotone Conference Center, Genoa/ITALY. ⟨inria-00077321⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

219 Consultations

107 Téléchargements

Linguistic features modeling based on Partial New Cache

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager