Experiments on the Construction of a Phonetically Balanced Corpus from the Web

Abstract : The construction of a speech recognition system requires a recorded set of phrases to compute the pertinent acoustic models. This set of phrases must be phonetically rich and balanced in order to obtain a robust recognizer. By tradition, this set is defined manually implicating a great human effort. In this paper we propose an automated method for assembling a phonetically balanced corpus (set of phrases) from the Web. The proposed method was used to construct a phonetically balanced corpus for the Mexican Spanish language.
Document type :
Conference papers
Complete list of metadatas

Cited literature [4 references]  Display  Hide  Download

https://hal.inria.fr/inria-00326519
Contributor : Dominique Vaufreydaz <>
Submitted on : Friday, October 3, 2008 - 12:08:13 PM
Last modification on : Thursday, February 7, 2019 - 5:02:47 PM
Long-term archiving on : Friday, June 4, 2010 - 12:10:15 PM

File

Villasenor04.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00326519, version 1

Collections

IMAG | INRIA | UGA | LIG

Citation

Luis Villaseñor-Pineda, Manuel Montes-Y-Gómez, Dominique Vaufreydaz, Jean-François Serignat. Experiments on the Construction of a Phonetically Balanced Corpus from the Web. Conference on Intelligent Text Processing and Computational Linguistics CICLing-2004, Feb 2004, Seoul, South Korea. 4 p. ⟨inria-00326519⟩

Share

Metrics

Record views

320

Files downloads

289