Improving TTS with corpus-specific pronunciation adaptation

Marie Tahon 1 Raheel Qader 1 Gwénolé Lecorvé 1 Damien Lolive 1
1 EXPRESSION - Expressiveness in Human Centered Data/Media
UBS - Université de Bretagne Sud, IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : Text-to-speech (TTS) systems are built on speech corpora which are labeled with carefully checked and segmented phonemes. However, phoneme sequences generated by automatic grapheme-to-phoneme converters during synthesis are usually inconsistent with those from the corpus, thus leading to poor quality synthetic speech signals. To solve this problem , the present work aims at adapting automatically generated pronunciations to the corpus. The main idea is to train corpus-specific phoneme-to-phoneme conditional random fields with a large set of linguistic, phonological, articulatory and acoustic-prosodic features. Features are first selected in cross-validation condition, then combined to produce the final best feature set. Pronunciation models are evaluated in terms of phoneme error rate and through perceptual tests. Experiments carried out on a French speech corpus show an improvement in the quality of speech synthesis when pronunciation models are included in the phonetization process. Appart from improving TTS quality, the presented pronunciation adaptation method also brings interesting perspectives in terms of expressive speech synthesis.
Type de document :
Communication dans un congrès
Interspeech, Sep 2016, San Francisco, United States
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01338111
Contributeur : Gwénolé Lecorvé <>
Soumis le : vendredi 23 septembre 2016 - 16:10:41
Dernière modification le : mardi 16 janvier 2018 - 15:54:23

Fichier

interspeech_2016_prononciation...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01338111, version 1

Citation

Marie Tahon, Raheel Qader, Gwénolé Lecorvé, Damien Lolive. Improving TTS with corpus-specific pronunciation adaptation. Interspeech, Sep 2016, San Francisco, United States. 〈hal-01338111〉

Partager

Métriques

Consultations de la notice

562

Téléchargements de fichiers

58