DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation

Amal Houidhek; Vincent Colotte; Zied Mnasri; Denis Jouvet

Communication Dans Un Congrès Année : 2018

DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation

(1) , (1) , (2) , (1)

1
2

Amal Houidhek

Fonction : Auteur

Speech Modeling for Facilitating Oral-Based Communication

Vincent Colotte

Fonction : Auteur
PersonId : 16268
IdHAL : vincent-colotte
IdRef : 070401683

Speech Modeling for Facilitating Oral-Based Communication

Zied Mnasri

Fonction : Auteur

Ecole Nationale d'Ingénieurs de Tunis

Denis Jouvet

Fonction : Auteur
PersonId : 15904
IdHAL : denis-jouvet
IdRef : 029418666

Speech Modeling for Facilitating Oral-Based Communication

Résumé

This paper investigates the use of deep neural networks (DNN) for Arabic speech synthesis. In parametric speech synthesis, whether HMM-based or DNN-based, each speech segment is described with a set of contextual features. These contextual features correspond to linguistic, phonetic and prosodic information that may affect the pronunciation of the segments. Gemination and vowel quantity (short vowel vs. long vowel) are two particular and important phenomena in Arabic language. Hence, it is worth investigating if those phenomena must be handled by using specific speech units, or if their specification in the contextual features is enough. Consequently four modelling approaches are evaluated by considering geminated consonants (respectively long vowels) either as fully-fledged phoneme units or as the same phoneme as their simple (respectively short) counterparts. Although no significant difference has been observed in previous studies relying on HMM-based modelling, this paper examines these modelling variants in the framework of DNN-based speech synthesis. Listening tests are conducted to evaluate the four modelling approaches, and to assess the performance of DNN-based Arabic speech synthesis with respect to previous HMM-based approach.

Mots clés

Parametric speech synthesis Hidden Markov Models Decision tree Deep neural network Arabic language

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

slsp-final-depose-30-juillet-2018.pdf (336.71 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Denis Jouvet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01904512

Soumis le : jeudi 25 octobre 2018-09:54:53

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : samedi 26 janvier 2019-13:57:22

Dates et versions

hal-01904512 , version 1 (25-10-2018)

Identifiants

HAL Id : hal-01904512 , version 1

Citer

Amal Houidhek, Vincent Colotte, Zied Mnasri, Denis Jouvet. DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation. SLSP 2018 - 6th International Conference on Statistical Language and Speech Processing, Oct 2018, Mons, Belgium. ⟨hal-01904512⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

171 Consultations

507 Téléchargements

DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager