A Corpus Balancing Method for Language Model Construction

Luis Villaseñor-Pineda; Manuel Montes-Y-Gómez; Manuel Pérez-Coutiño; Dominique Vaufreydaz

Communication Dans Un Congrès Année : 2003

A Corpus Balancing Method for Language Model Construction

(1) , (1) , (1) , (2)

1
2

Luis Villaseñor-Pineda

Fonction : Auteur

Laboratorio de Tecnologías de Lenguaje

Manuel Montes-Y-Gómez

Fonction : Auteur

Laboratorio de Tecnologías de Lenguaje

Manuel Pérez-Coutiño

Fonction : Auteur

Laboratorio de Tecnologías de Lenguaje

Dominique Vaufreydaz

Fonction : Auteur
PersonId : 8656
IdHAL : vaufreydaz
ORCID : 0000-0002-8825-0973
IdRef : 064812596

Equipe GEOD, Groupe d'étude sur l'oral et le dialogue

Résumé

The language model is an important component of any speech recogn ition system. In this paper, we present a lexical enrichment methodology of corpora focused on the construction of statistical language models. This methodology considers, on one hand, the identification of the set of poor represented words of a given training corpus, and on the other hand, the enrichment of the given corpus by the repetitive inclusion of selected text fragments containing these words. The first part of the paper describes the formal details about this methodology; the second part presents some experiments and results that validate our method.

Mots clés

language model lexical analysis corpora and lexical enrichment

Domaines

Informatique et langage [cs.CL]

Fichier principal

Villasenor03a.pdf (41.54 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Dominique Vaufreydaz : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00326515

Soumis le : vendredi 3 octobre 2008-11:59:58

Dernière modification le : jeudi 4 avril 2024-21:40:59

Archivage à long terme le : vendredi 4 juin 2010-12:10:06

Dates et versions

inria-00326515 , version 1 (03-10-2008)

Identifiants

HAL Id : inria-00326515 , version 1

Citer

Luis Villaseñor-Pineda, Manuel Montes-Y-Gómez, Manuel Pérez-Coutiño, Dominique Vaufreydaz. A Corpus Balancing Method for Language Model Construction. Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003), Feb 2003, Mexico City, Mexico. 9 p. ⟨inria-00326515⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_SIDCH

83 Consultations

246 Téléchargements

A Corpus Balancing Method for Language Model Construction

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager