Improving transfer of expressivity for end-to-end multispeaker text-to-speech synthesis - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Improving transfer of expressivity for end-to-end multispeaker text-to-speech synthesis

Résumé

The main goal of this work is to generate expressive speech in different speaker’s voices for which no expressive speech data is available. The presented approach conditions Tacotron 2 speech synthesis with latent representations extracted from text, speaker identity, and reference expressive Mel spectrogram. We propose to use multiclass N-pair loss in the end-to-end multispeaker expressive Text-To-Speech (TTS) for improving the transfer of expressivity to the target speaker’s voice. We have jointly trained the end-to-end (E2E) TTS with multiclass N-pair loss to discriminate between various emotions. This augmentation of the loss function during training paves the way to enhance the latent space representation of emotions.We have experimented with two different neural network architectures for expressivity in the encoder, namely global style token (GST) and variational autoencoder (VAE). We transferred the expressivity using the mean of latent representation extracted from the expressivity encoder for each emotion. The obtained results show that adding multiclass N-pair loss based deep metric learning in the training process improves expressivity in the desired speaker’s voice.
Fichier principal
Vignette du fichier
EUSIPCO_2021_camera_ready_version.pdf (199.49 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02978485 , version 1 (26-10-2020)
hal-02978485 , version 2 (10-03-2021)
hal-02978485 , version 3 (01-06-2021)

Identifiants

Citer

Ajinkya Kulkarni, Vincent Colotte, Denis Jouvet. Improving transfer of expressivity for end-to-end multispeaker text-to-speech synthesis. EUSIPCO 2021 - 29th European Signal Processing Conference, European Association for Signal Processing (EURASIP), Aug 2021, Dublin / Virtual, Ireland. ⟨10.23919/EUSIPCO54536.2021.9616249⟩. ⟨hal-02978485v3⟩
562 Consultations
844 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More