Tacotron: A fully end-to-end text-to-speech synthesis model, ArXiv, 2017. ,
Deep voice 3: 2000-speaker neural text-to-speech, ArXiv, 2017. ,
Char2wav: End-to-end speech synthesis, 2017. ,
Voiceloop: Voice fitting and synthesis via a phonological loop, ICLR, 2017. ,
Semi-supervised learning with deep generative models, NIPS, 2014. ,
Deep generative models for image generation: A practical comparison between variational autoencoders and generative adversarial networks, Mobile, Secure, and Programmable Networking, pp.1-8, 2019. ,
Neural discrete representation learning, NIPS, 2017. ,
Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.1798-1828, 2013. ,
Variational inference with normalizing flows, pp.1730-1538, 2015. ,
Improved variational inference with inverse autoregressive flow, Advances in Neural Information Processing Systems, pp.4743-4751, 2016. ,
Parallel wavenet: Fast highfidelity speech synthesis, pp.3915-3923, 2018. ,
Universal audio synthesizer control with normalizing flows, ArXiv, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02471340
Hierarchical generative modeling for controllable speech synthesis, 2019. ,
Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis, 2018. ,
Towards end-to-end prosody transfer for expressive speech synthesis with tacotron, ArXiv, pp.4700-4709, 2018. ,
X-vectors: Robust dnn embeddings for speaker recognition, ICASSP, pp.5329-5333, 2018. ,
Learning latent representations for style control and transfer in end-to-end speech synthesis, ICASSP, pp.6945-6949, 2019. ,
Expressive speech synthesis via modeling expressions with variational autoencoder, Interspeech. ISCA, pp.3067-3071, 2018. ,
Robust and fine-grained prosody control of end-to-end speech synthesis, ICASSP, pp.5911-5915, 2019. ,
Conditional variational auto-encoder for text-driven expressive audiovisual speech synthesis, INTERSPEECH, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02175776
Deep metric learning: A survey, Symmetry, vol.11, p.1066, 2019. ,
Improved deep metric learning with multi-class n-pair loss objective, NIPS, 2016. ,
Deep variational metric learning, ECCV, 2018. ,
Merlin: An open source neural network speech synthesis system, 2016. ,
Deep Variational Metric Learning For Transfer Of Expressivity In Multispeaker Text To Speech, 2020. ,
URL : https://hal.archives-ouvertes.fr/hal-02573885
Sylvester normalizing flows for variational inference, UAI, pp.393-402 ,
Voxceleb2: Deep speaker recognition, in INTERSPEECH, 2018. ,
The kaldi speech recognition toolkit, 2011. ,
The siwis french speech synthesis database, 2017. ,
Tundra: a multilingual corpus of found data for tts research created with light supervision, INTER-SPEECH, 2013. ,
World: A vocoder-based high-quality speech synthesis system for real-time applications, IEICE, pp.1877-1884, 2016. ,
Mean opinion score (mos) revisited: methods and applications, limitations and alternatives, Multimedia Systems, pp.213-227, 2014. ,