Skip to Main content Skip to Navigation
Journal articles

Duration modelling and evaluation for Arabic statistical parametric speech synthesis

Imene Zangar 1 Zied Mnasri 1 Vincent Colotte 2 Denis Jouvet 2
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Sound duration is responsible for rhythm and speech rate. Furthermore , in some languages phoneme length is an important phonetic and prosodic factor. For example, in Arabic, gemination and vowel quantity are two important characteristics of the language. Therefore, accurate duration modelling is crucial for Arabic TTS systems. This paper is interested in improving the modelling of phone duration for Arabic statistical parametric speech synthesis using DNN-based models. In fact, since a few years, DNN have been frequently used for parametric speech synthesis, instead of HMM. Therefore, several variants of DNN-based duration models for Arabic are investigated. The novelty consists in training a specific DNN model for each class of sounds, i.e. short vowels, long vowels, simple consonants and geminated consonants. The main idea behind this choice is the improvement that we already achieved in the quality of Arabic parametric speech synthesis by the introduction of two specific features of Arabic, i.e. gemination and vowel quantity into the standard HTS feature set. Both objective and subjective evaluations show that using a specific model for each class of sounds leads to a more accurate modelling of the phone duration in Arabic parametric speech synthesis, outperforming the state-of-the-art duration modelling systems.
Document type :
Journal articles
Complete list of metadatas

Cited literature [59 references]  Display  Hide  Download

https://hal.inria.fr/hal-03007287
Contributor : Denis Jouvet <>
Submitted on : Monday, November 16, 2020 - 12:03:44 PM
Last modification on : Saturday, November 28, 2020 - 10:24:01 AM

File

Duration_Modelling_article_Sep...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03007287, version 1

Collections

Citation

Imene Zangar, Zied Mnasri, Vincent Colotte, Denis Jouvet. Duration modelling and evaluation for Arabic statistical parametric speech synthesis. Multimedia Tools and Applications, Springer Verlag, 2020. ⟨hal-03007287⟩

Share

Metrics

Record views

13

Files downloads

19

Données de recherche