F0 modeling using DNN for Arabic parametric speech synthesis

Imene Zangar 1 Zied Mnasri 1 Vincent Colotte 2 Denis Jouvet 2
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Deep neural networks (DNN) are gaining increasing interest in speech processing applications, especially in text-to-speech synthesis. Actually state-of-the-art speech generation tools, like MERLIN and WAVENET are totally DNN-based. However, every language has to be modeled on its own using DNN. One of the key components of speech synthesis modules is the prosodic parameters generation module from contextual input features, and more particularly the fundamental frequency (F0) generation module. Actually F0 is responsible for intonation , that is why it should be accurately modeled to provide intelligible and natural speech. However, F0 modeling is highly dependent on the language. Therefore, language specific characteristics have to be taken into account. In this paper, we aim to model F0 for Arabic speech synthesis with feedforward and recurrent DNN, and using specific characteristic features for Arabic like vowel quantity and gemination, in order to improve the quality of Arabic parametric speech synthesis.
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-02177496
Contributor : Vincent Colotte <>
Submitted on : Tuesday, July 9, 2019 - 9:41:52 AM
Last modification on : Tuesday, July 23, 2019 - 2:09:49 PM

File

conference_INNSBDDL2019.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02177496, version 1

Collections

Citation

Imene Zangar, Zied Mnasri, Vincent Colotte, Denis Jouvet. F0 modeling using DNN for Arabic parametric speech synthesis. INNSBDDL 2019 - INNS Big Data and Deep Learning, Apr 2019, Sestri Levante, Italy. ⟨hal-02177496⟩

Share

Metrics

Record views

48

Files downloads

416