Improving language models by using distant information

Armelle Brun; David Langlois; Kamel Smaïli

Communication Dans Un Congrès Année : 2007

Improving language models by using distant information

(1) , (1) , (1)

Armelle Brun

Fonction : Auteur
PersonId : 831057
IdHAL : armelle-brun

Analysis, perception and recognition of speech

David Langlois

Fonction : Auteur
PersonId : 298
IdHAL : david-langlois
IdRef : 070239509

Analysis, perception and recognition of speech

Kamel Smaïli

Fonction : Auteur
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Analysis, perception and recognition of speech

Résumé

This study examines how to take originally advantage from distant information instatistical language models. We show that it is possible to use n-gram models considering histories different from those used during training. These models are called crossing context models. Our study deals with classical and distant n-gram models. A mixture of four models is proposed and evaluated. A bigram linear mixture achieves an improvement of 14% in terms of perplexity. Moreover the trigram mixture outperforms the standard trigram by 5.6%. These improvements have been obtained without complexifying standard n-gram models. The resulting mixture language model has been integrated into a speech recognition system. Its evaluation achieves a slight improvement in terms of word error rate on the data used for the francophone evaluation campaign ESTER. Finally, the impact of the proposed crossing context language models on performance is presented according to various speakers.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

ISSAP2007.pdf (100.78 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Armelle Brun : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00187084

Soumis le : mardi 13 novembre 2007-15:34:25

Dernière modification le : vendredi 24 mars 2023-14:52:49

Archivage à long terme le : lundi 12 avril 2010-02:04:36

Dates et versions

inria-00187084 , version 1 (13-11-2007)

Identifiants

HAL Id : inria-00187084 , version 1

Citer

Armelle Brun, David Langlois, Kamel Smaïli. Improving language models by using distant information. International Symposium on Signal Processing and its Applications - ISSPA 2007, Feb 2007, Sharjah, United Arab Emirates. ⟨inria-00187084⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

79 Consultations

329 Téléchargements

Improving language models by using distant information

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager