A Corpus Balancing Method for Language Model Construction

Abstract : The language model is an important component of any speech recogn ition system. In this paper, we present a lexical enrichment methodology of corpora focused on the construction of statistical language models. This methodology considers, on one hand, the identification of the set of poor represented words of a given training corpus, and on the other hand, the enrichment of the given corpus by the repetitive inclusion of selected text fragments containing these words. The first part of the paper describes the formal details about this methodology; the second part presents some experiments and results that validate our method.
Type de document :
Communication dans un congrès
Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003), Feb 2003, Mexico City, Mexico. 9 p., 2003
Liste complète des métadonnées

Littérature citée [6 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00326515
Contributeur : Dominique Vaufreydaz <>
Soumis le : vendredi 3 octobre 2008 - 11:59:58
Dernière modification le : vendredi 3 octobre 2008 - 16:55:50
Document(s) archivé(s) le : vendredi 4 juin 2010 - 12:10:06

Fichier

Villasenor03a.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00326515, version 1

Citation

Luis Villaseñor-Pineda, Manuel Montes-Y-Gómez, Manuel Pérez-Coutiño, Dominique Vaufreydaz. A Corpus Balancing Method for Language Model Construction. Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003), Feb 2003, Mexico City, Mexico. 9 p., 2003. 〈inria-00326515〉

Partager

Métriques

Consultations de la notice

120

Téléchargements de fichiers

91