Skip to Main content Skip to Navigation
Conference papers

Experiments on the Construction of a Phonetically Balanced Corpus from the Web

Luis Villaseñor-Pineda 1 Manuel Montes-Y-Gómez 1 Dominique Vaufreydaz 2, 3 Jean-François Serignat 3
2 PRIMA - Perception, recognition and integration for observation of activity
GRAVIR - IMAG - Laboratoire d'informatique GRAphique, VIsion et Robotique de Grenoble, Inria Grenoble - Rhône-Alpes
3 CLIPS-IMAG - Equipe GEOD, Groupe d'étude sur l'oral et le dialogue
LIG - Laboratoire d'Informatique de Grenoble
Abstract : The construction of a speech recognition system requires a recorded set of phrases to compute the pertinent acoustic models. This set of phrases must be phonetically rich and balanced in order to obtain a robust recognizer. By tradition, this set is defined manually implicating a great human effort. In this paper we propose an automated method for assembling a phonetically balanced corpus (set of phrases) from the Web. The proposed method was used to construct a phonetically balanced corpus for the Mexican Spanish language.
Document type :
Conference papers
Complete list of metadata

Cited literature [4 references]  Display  Hide  Download
Contributor : Dominique Vaufreydaz Connect in order to contact the contributor
Submitted on : Friday, October 3, 2008 - 12:08:13 PM
Last modification on : Tuesday, October 19, 2021 - 11:16:38 PM
Long-term archiving on: : Friday, June 4, 2010 - 12:10:15 PM


Files produced by the author(s)


  • HAL Id : inria-00326519, version 1



Luis Villaseñor-Pineda, Manuel Montes-Y-Gómez, Dominique Vaufreydaz, Jean-François Serignat. Experiments on the Construction of a Phonetically Balanced Corpus from the Web. Conference on Intelligent Text Processing and Computational Linguistics CICLing-2004, Feb 2004, Seoul, South Korea. 4 p. ⟨inria-00326519⟩



Les métriques sont temporairement indisponibles