Statistical Pronunciation Adaptation for Spontaneous Speech Synthesis

Raheel Qader; Gwénolé Lecorvé; Damien Lolive; Marie Tahon; Pascale Sébillot

Communication Dans Un Congrès Année : 2017

Statistical Pronunciation Adaptation for Spontaneous Speech Synthesis

(1) , (1) , (1) , (1) , (2)

1
2

Raheel Qader

Fonction : Auteur
PersonId : 958276

Expressiveness in Human Centered Data/Media

Gwénolé Lecorvé

Fonction : Auteur
PersonId : 20677
IdHAL : gwenole-lecorve
ORCID : 0000-0002-4271-2087
IdRef : 150245254

Expressiveness in Human Centered Data/Media

Damien Lolive

Fonction : Auteur
PersonId : 5088
IdHAL : damien-lolive
ORCID : 0000-0002-1110-5444
IdRef : 13017498X

Expressiveness in Human Centered Data/Media

Marie Tahon

Fonction : Auteur
PersonId : 9821
IdHAL : marie-tahon
ORCID : 0000-0002-6782-0332
IdRef : 165065532

Expressiveness in Human Centered Data/Media

Pascale Sébillot

Fonction : Auteur
PersonId : 21840
IdHAL : pascale-sebillot
ORCID : 0000-0002-5429-4302
IdRef : 075988453

Creating and exploiting explicit links between multimedia fragments

Résumé

To bring more expressiveness into text-to-speech systems, this paper presents a new pronunciation variant generation method which works by adapting standard, i.e., dictionary-based, pronunciations to a spontaneous style. Its strength and originality lie in exploiting a wide range of linguistic, articulatory and prosodic features, and in using a probabilistic machine learning framework, namely conditional random fields and phoneme-based n-gram models. Extensive experiments on the Buckeye corpus of English conversational speech demonstrate the effectiveness of the approach through objective and perceptual evaluations.

Mots clés

speech synthesis spontaneous speech pronunciation modeling statistical adaptation conditional random fields

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Traitement du texte et du document Son [cs.SD]

Fichier principal

pronunciation_adaptation_raheel.pdf (288.42 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gwénolé Lecorvé : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01532035

Soumis le : vendredi 2 juin 2017-12:15:00

Dernière modification le : mardi 3 octobre 2023-09:49:25

Archivage à long terme le : mercredi 13 décembre 2017-07:23:39

Dates et versions

hal-01532035 , version 1 (02-06-2017)

Identifiants

HAL Id : hal-01532035 , version 1

Citer

Raheel Qader, Gwénolé Lecorvé, Damien Lolive, Marie Tahon, Pascale Sébillot. Statistical Pronunciation Adaptation for Spontaneous Speech Synthesis. Text, Speech and Dialogue (TSD), Aug 2017, Prague, Czech Republic. ⟨hal-01532035⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES ENSSAT IRISA IRISA-INSA-R IRISA-D6 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR UR1-MATH-NUM

559 Consultations

292 Téléchargements

Statistical Pronunciation Adaptation for Spontaneous Speech Synthesis

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager