Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model, Journal: CoRR, arxiv.org, 2017. ,
Deep Voice 3: 2000-Speaker Neural Text-to-Speech, CoRR, 2017. ,
, Char2Wav: End-to-End Speech Synthesis. ICLR, 2017.
VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop, 2017. ,
Learning latent representations for style control and transfer in end-to-end speech synthesis, ICASSP, 2018. ,
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder, 2018. ,
, Hierarchical Generative Modeling for Controllable Speech Synthesis. ICLR, 2019.
, Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis. ICML, 2018.
, Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron. ICML, 2018.
Robust and Fine-grained Prosody Control of Endto-end Speech Synthesis, ICASSP, 2019. ,
Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System, ICASSP, 2018. ,
Layer adaptation for transfer of expressivity in speech synthesis, Language & Technology Conference (LTC), 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02177945
, Deep Variational Metric Learning.The European Conference on Computer Vision, 2018.
, Deep Metric Learning: A Survey. Symmetry, vol.11, pp.2073-8994, 2019.
Improved Deep Metric Learning with Multi-class N-pair Loss Objective, 2016. ,
X-Vectors: Robust DNN Embeddings for Speaker Recognition, ICASSP, 2018. ,
Auto-Encoding Variational Bayes. CoRR, arxiv.org, abs/1312, vol.6114, 2013. ,
Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ICML, 2014. ,
, Generating Sentences from a Continuous Space. SIGNLL Conference on Computational Natural Language Learning, 2016.
VoxCeleb2: Deep Speaker Recognition, 2018. ,
, Silovský, Georg Stemmer and Karel Veselý. The Kaldi Speech Recognition Toolkit. ASRU conference, 2011.
WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications, IEICE Transactions, 2016. ,
The SIWIS French Speech Synthesis Database, 2017. ,
TUNDRA: a multilingual corpus of found data for TTS research created with light supervision, 2013. ,
Mean Opinion Score (MOS) Revisited: Methods and Applications, Limitations and Alternatives. Multimedia System, vol.22, 2016. ,
Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02175776
Merlin: An Open Source Neural Network Speech Synthesis System, ISCA Speech Synthesis Workshop (SSW9), 2016. ,