Deep variational metric learning for transfer of expressivity in multispeaker text to Speech - Archive ouverte HAL Access content directly
Conference Papers Year :

Deep variational metric learning for transfer of expressivity in multispeaker text to Speech

(1) , (1) , (1)
1

Abstract

In this paper, we propose to use the deep metric learning based multi-class N-pair loss, for text-to-speech (TTS) synthesis. We use the proposed loss function in a recurrent conditional variational autoencoder (RCVAE) for transferring expressivity in a French multispeaker TTS system. We extracted the speaker embeddings from the x-vector based speaker recognition model trained on speech data from many speakers to represent the speaker identity. We use mean of the latent variables to transfer expressivity for each emotion to generate expressive speech in the desired speaker's voice. In contrast to the commonly used loss functions such as triplet loss or contrastive loss, multi-class N-pair loss considers all the negative examples which make each class of emotion distinguished from one another. Furthermore, the presented approach assists in creating a robust representation of expressivity irrespective of speaker identities. Our proposed approach demonstrates the improved performance for transfer of expressivity in the target speaker's voice in a synthesized speech. To our knowledge, it is for the first time multi-class N-pair loss and x-vector based speaker embeddings are used in a TTS system.
Fichier principal
Vignette du fichier
SLSP_2020_published_version.pdf (683.84 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02573885 , version 1 (14-05-2020)
hal-02573885 , version 2 (22-10-2020)

Identifiers

  • HAL Id : hal-02573885 , version 2

Cite

Ajinkya Kulkarni, Vincent Colotte, Denis Jouvet. Deep variational metric learning for transfer of expressivity in multispeaker text to Speech. SLSP 2020 - 8th International Conference on Statistical Language and Speech Processing, Oct 2020, Cardiff / Virtual, United Kingdom. ⟨hal-02573885v2⟩
268 View
632 Download

Share

Gmail Facebook Twitter LinkedIn More