Perception of expressivity in TTS: linguistics, phonetics or prosody?

Marie Tahon 1 Gwénolé Lecorvé 1 Damien Lolive 1 Raheel Qader 1
1 EXPRESSION - Expressiveness in Human Centered Data/Media
UBS - Université de Bretagne Sud, IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : Actually a lot of work on expressive speech focus on acoustic models and prosody variations. However, in expressive Text-to-Speech (TTS) systems, prosody generation strongly relies on the sequence of phonemes to be expressed and also to the words below these phonemes. Consequently, linguistic and phonetic cues play a significant role in the perception of expressivity. In previous works, we proposed a statistical corpus-specific framework which adapts phonemes derived from an automatic phonetizer to the phonemes as labelled in the TTS speech corpus. This framework allows to synthesize good quality but neutral speech samples. The present study goes further in the generation of expressive speech by predicting not only corpus-specific but also expressive pronunciation. It also investigates the shared impacts of linguistics, phonetics and prosody, these impacts being evaluated through different French neutral and expressive speech collected with different speaking styles and linguistic content and expressed under diverse emotional states. Perception tests show that expressivity is more easily perceived when linguistics , phonetics and prosody are consistent. Linguistics seems to be the strongest cue in the perception of expressivity, but phonetics greatly improves expressiveness when combined with and adequate prosody.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal-univ-lemans.archives-ouvertes.fr/hal-01623916
Contributor : Marie Tahon <>
Submitted on : Wednesday, October 25, 2017 - 6:26:00 PM
Last modification on : Thursday, November 15, 2018 - 11:58:49 AM
Long-term archiving on : Friday, January 26, 2018 - 4:14:33 PM

File

SLSP2017_Tahon_final.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Marie Tahon, Gwénolé Lecorvé, Damien Lolive, Raheel Qader. Perception of expressivity in TTS: linguistics, phonetics or prosody?. Statistical Language and Speech Processing, Oct 2017, Le Mans, France. pp.262-274, ⟨10.1007/978-3-319-68456-7_22⟩. ⟨hal-01623916v1⟩

Share

Metrics

Record views

434

Files downloads

21