Building a Pronunciation Lexicon for a Speech Transcription System from Wiktionary Pronunciations only - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Building a Pronunciation Lexicon for a Speech Transcription System from Wiktionary Pronunciations only

Denis Jouvet
Dominique Fohr
Irina Illina

Résumé

This paper shows that web pronunciations such as those available in the Wiktionary can be used efficiently for building a pronunciation lexicon for a speech transcription system. The pronunciations of words and expressions extracted from the Wiktionary provided data that were used either directly or for training data-driven grapheme-to-phoneme conversion systems, in order to make possible the development of pronunciation lexicons. As an example, French Wiktionary pronunciations were extracted. The derived pronunciations lexicons were then used for training the acoustic models and for evaluating speech recognition performance on the French broadcast news data from the ESTER2 campaign. Moreover, the results show that combining the pronunciations delivered by two different grapheme-to-phoneme conversion systems yields significant performance improvement.
Fichier non déposé

Dates et versions

inria-00616330 , version 1 (22-08-2011)

Identifiants

  • HAL Id : inria-00616330 , version 1

Citer

Denis Jouvet, Dominique Fohr, Irina Illina. Building a Pronunciation Lexicon for a Speech Transcription System from Wiktionary Pronunciations only. XIV International Conference "Speech and Computer" (SPECOM'2011), Sep 2011, Kazan, Russia. ⟨inria-00616330⟩
185 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More