Skip to Main content Skip to Navigation
Conference papers

Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis

Sara Dahmani 1 Vincent Colotte 1 Valérian Girard 1 Slim Ouni 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In recent years, the performance of speech synthesis systems has been improved thanks to deep learning-based models, but generating expressive audiovisual speech is still an open issue. The variational auto-encoders (VAE)s are recently proposed to learn latent representations of data. In this paper, we present a system for expressive text-to-audiovisual speech synthesis that learns a latent embedding space of emotions using a conditional generative model based on the variational auto-encoder framework. When conditioned on textual input, the VAE is able to learn an embedded representation that captures emotion characteristics from the signal, while being invariant to the phonetic content of the utterances. We applied this method in an unsuper-vised manner to generate duration, acoustic and visual features of speech. This conditional variational auto-encoder (CVAE) has been used to blend emotions together. This model was able to generate nuances of a given emotion or to generate new emotions that do not exist in our database. We conducted three perceptive experiments to evaluate our findings.
Complete list of metadata

Cited literature [38 references]  Display  Hide  Download
Contributor : Slim Ouni Connect in order to contact the contributor
Submitted on : Saturday, July 6, 2019 - 11:33:31 AM
Last modification on : Wednesday, November 3, 2021 - 7:57:28 AM


Files produced by the author(s)


  • HAL Id : hal-02175776, version 1


Sara Dahmani, Vincent Colotte, Valérian Girard, Slim Ouni. Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis. INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Sep 2019, Graz, Austria. ⟨hal-02175776⟩



Les métriques sont temporairement indisponibles