Skip to Main content Skip to Navigation
New interface
Conference papers

A Corpus Balancing Method for Language Model Construction

Abstract : The language model is an important component of any speech recogn ition system. In this paper, we present a lexical enrichment methodology of corpora focused on the construction of statistical language models. This methodology considers, on one hand, the identification of the set of poor represented words of a given training corpus, and on the other hand, the enrichment of the given corpus by the repetitive inclusion of selected text fragments containing these words. The first part of the paper describes the formal details about this methodology; the second part presents some experiments and results that validate our method.
Document type :
Conference papers
Complete list of metadata

Cited literature [6 references]  Display  Hide  Download
Contributor : Dominique Vaufreydaz Connect in order to contact the contributor
Submitted on : Friday, October 3, 2008 - 11:59:58 AM
Last modification on : Wednesday, July 6, 2022 - 4:21:39 AM
Long-term archiving on: : Friday, June 4, 2010 - 12:10:06 PM


Files produced by the author(s)


  • HAL Id : inria-00326515, version 1



Luis Villaseñor-Pineda, Manuel Montes-Y-Gómez, Manuel Pérez-Coutiño, Dominique Vaufreydaz. A Corpus Balancing Method for Language Model Construction. Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003), Feb 2003, Mexico City, Mexico. 9 p. ⟨inria-00326515⟩



Record views


Files downloads