DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation

Amal Houidhek 1 Vincent Colotte 1 Zied Mnasri 2 Denis Jouvet 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This paper investigates the use of deep neural networks (DNN) for Arabic speech synthesis. In parametric speech synthesis, whether HMM-based or DNN-based, each speech segment is described with a set of contextual features. These contextual features correspond to linguistic, phonetic and prosodic information that may affect the pronunciation of the segments. Gemination and vowel quantity (short vowel vs. long vowel) are two particular and important phenomena in Arabic language. Hence, it is worth investigating if those phenomena must be handled by using specific speech units, or if their specification in the contextual features is enough. Consequently four modelling approaches are evaluated by considering geminated consonants (respectively long vowels) either as fully-fledged phoneme units or as the same phoneme as their simple (respectively short) counterparts. Although no significant difference has been observed in previous studies relying on HMM-based modelling, this paper examines these modelling variants in the framework of DNN-based speech synthesis. Listening tests are conducted to evaluate the four modelling approaches, and to assess the performance of DNN-based Arabic speech synthesis with respect to previous HMM-based approach.
Document type :
Conference papers
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal.inria.fr/hal-01904512
Contributor : Denis Jouvet <>
Submitted on : Thursday, October 25, 2018 - 9:54:53 AM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Long-term archiving on: Saturday, January 26, 2019 - 1:57:22 PM

File

slsp-final-depose-30-juillet-2...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01904512, version 1

Collections

Citation

Amal Houidhek, Vincent Colotte, Zied Mnasri, Denis Jouvet. DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation. SLSP 2018 - 6th International Conference on Statistical Language and Speech Processing, Oct 2018, Mons, Belgium. ⟨hal-01904512⟩

Share

Metrics

Record views

184

Files downloads

331