D. H. Klatt, Linguistic uses of segmental duration in English: Acoustic and perceptual evidence, The Journal of the Acoustical Society of America, vol.59, issue.5, p.1208, 1976.

P. Rubin, T. Baer, and P. Mermelstein, An articulatory synthesizer for perceptual research, The Journal of the Acoustical Society of America, vol.70, issue.2, pp.321-328, 1981.

E. Moulines and F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Communication, vol.9, issue.5-6, pp.453-467, 1990.

T. Dutoit and H. Leich, MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database, Speech Communication, vol.13, issue.3-4, pp.435-440, 1993.

A. J. Hunt and A. W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, pp.373-376

T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, Proc. European Conference on Speech Communication and Technology, pp.2347-2350, 1999.

S. Imai, K. Sumita, and C. Furuichi, Mel Log Spectrum Approximation (MLSA) filter for speech synthesis, Electronics and Communications in Japan (Part I: Communications), vol.66, issue.2, pp.10-18, 1983.

H. Zen, K. Tokuda, and A. W. Black, Statistical parametric speech synthesis, Speech Communication, vol.51, issue.11, pp.1039-1064, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00746106

H. Ze, A. Senior, and M. Schuster, Statistical parametric speech synthesis using deep neural networks, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.7962-7966, 2013.

Y. Wang, R. J. Skerry-ryan, D. Stanton, Y. Wu, R. J. Weiss et al., Tacotron: Towards End-to-End Speech Synthesis, Interspeech 2017, pp.4006-4010, 2017.

J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly et al., Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4779-4783, 2018.

H. Mixdorff, MFGI, a Linguistically Motivated Quantitative Model of German Prosody, Improvements in Speech Synthesis, pp.134-143

K. Shinoda and T. Watanabe, Speaker adaptation with autonomous model complexity control by MDL principle, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, pp.99-102

O. Abdelhamid, S. M. Abdou, and M. Rashwan, Improving Arabic HMM-based speech synthesis quality, Proc. International Conference on Spoken Language Processing, pp.1332-1335, 2006.

H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.1303-1306

M. Morise, F. Yokomori, and K. Ozawa, WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications, IEICE Transactions on Information and Systems, vol.E99.D, issue.7, pp.1877-1884, 2016.

W. N. Campbell, Predicting segmental durations for accommodation within a syllable-level timing framework, Proc. European Conference on Speech Communication and Technology, pp.1332-1335, 1993.

J. P. Van-santen, Assignment of segmental duration in text-to-speech synthesis, Computer Speech & Language, vol.8, issue.2, pp.95-128, 1994.

D. Newman, The phonetics of Arabic, The Journal of the American Oriental Society, vol.44, pp.1-6, 1984.

R. Abdelmalek and Z. Mnasri, High quality Arabic text-to-speech synthesis using unit selection, 2016 13th International Multi-Conference on Systems, Signals & Devices (SSD), pp.1-5, 2016.

D. W. Griffin and J. S. Lim, Multiband excitation vocoder, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.36, issue.8, pp.1223-1235, 1988.

F. Boukadida and N. Ellouze, Modélisation Statistique de la Durée des Voyelles en Parole Arabe, Proc. Science of Electronics, Telecommunications and Information Technology Conference, pp.1-4, 2005.

A. Zaki, A. Rajouani, and M. Najim, Un modèle prédictif de la durée segmentale pour la synthèse de la parole arabeà partir du texte, Proc. Journées d'Etudes sur la Parole, pp.89-92, 2002.

Z. Mnasri, F. Boukadida, and N. Ellouze, F<sub>0</sub> contour parametric modeling using multivariate adaptive regression splines for arabic text-to-speech synthesis, Eighth International Multi-Conference on Systems, Signals & Devices, vol.4, pp.533-542, 2011.

A. Houidhek, V. Colotte, Z. Mnasri, D. Jouvet, and I. Zangar, Statistical modelling of speech units in HMM-based speech synthesis for Arabic, Proc. Language & Technology conference, pp.1-6, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01649034

I. Zangar, Z. Mnasri, V. Colotte, D. Jouvet, and A. Houidhek, Duration modeling using DNN for Arabic speech synthesis, 9th International Conference on Speech Prosody 2018, pp.597-601, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01889917

T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Duration modeling for HMM-based speech synthesis, Proc. International Conference on Spoken Language Processing, pp.29-32, 1998.

H. Silén, E. Helander, J. Nurminen, and M. Gabbouj, Analysis of duration prediction accuracy in HMM-based speech synthesis, Proc. International Conference on Speech Prosody, pp.1-4, 2010.

H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Hidden semi-Markov model based speech synthesis, Proc. International Conference on Spoken Language Processing, pp.1393-1396, 2004.

S. Pan, J. Tao, and Y. Wang, A state duration generation algorithm considering global variance for HMM-based speech synthesis, Proc. Annual Summit and Conference, 2011.

W. Yijian and W. Renhua, HMM-based Trainable Speech Synthesis for Chinese. Journal of Chinese Information Processing, vol.20, pp.75-81, 2006.

B. Gao, Y. Qian, Z. Wu, and F. K. Soong, Duration refinement by jointly optimizing state and longer unit likelihood, Proc. Annual Conference of the International Speech Communication Association, pp.2266-2269, 2008.

H. Lu, Y. J. Wu, K. Tokuda, L. R. Dai, and R. H. Wang, Full covariance state duration modeling for HMM-based speech synthesis, Proc. International Conference on Acoustics, Speech and Signal Processing, pp.4033-4036, 2009.

Y. Ishimatsu, Investigation of state duration model based on gamma distribution for HMM-based speech synthesis, IEICE Technical Report, pp.2001-81, 2001.

M. D. Riley, Some applications of tree-based modelling to speech and language, Proceedings of the workshop on Speech and Natural Language - HLT '89, pp.229-232, 1989.

K. S. Rao and B. Yegnanarayana, Modeling durations of syllables using neural networks, Computer Speech & Language, vol.21, issue.2, pp.282-295, 2007.

M. Riedi, Modeling segmental duration with multivariate adaptive regression splines, Proc. European Conference on Speech Communication and Technology, pp.2627-2630, 1997.

A. Lazaridis, P. E. Honnet, and P. N. Garner, SVR vs MLP for Phone Duration Modelling in HMM-based Speech Synthesis, 7th International Conference on Speech Prosody 2014, 2014.

U. Ogbureke, J. Cabral, and J. Berndsen, Explicit duration modelling in HMM-based speech synthesis using continuous hidden Markov Model, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), 2012.

K. Yu, F. Mairesse, and S. Young, Word-level emphasis modelling in HMM-based speech synthesis, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4238-4241, 2010.

B. Chen, T. Bian, and K. Yu, Discrete Duration Model for Speech Synthesis, Interspeech 2017, pp.789-793, 2017.

B. Chen, J. Lai, and K. Yu, Comparison of Modeling Target in LSTM-RNN Duration Model, Interspeech 2017, pp.794-798, 2017.

Z. Wu, O. Watts, and S. King, Merlin: An Open Source Neural Network Speech Synthesis System, 9th ISCA Speech Synthesis Workshop, pp.202-207, 2016.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997.

G. E. Henter, S. Ronanki, O. Watts, M. Wester, Z. Wu et al., Robust TTS duration modelling using DNNS, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5130-5134, 2016.

R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, F0 contour prediction with a deep belief network-Gaussian process hybrid model, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2268-2272, 2013.

H. Zen and H. Sak, Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4470-4474, 2015.

D. Moungsri, T. Koriyama, and T. Kobayashi, Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5495-5499, 2017.

N. Halabi and M. Wald, Phonetic inventory for an Arabic speech corpus, Proc. International Conference on Language Resources and Evaluation, pp.734-738, 2016.

K. M. Rosen, Analysis of speech segment duration with the lognormal distribution: A basis for unification and comparison, Journal of Phonetics, vol.33, issue.4, pp.411-426, 2005.

J. Sola and J. Sevilla, Importance of input data normalization for the application of neural networks to complex industrial problems, IEEE Transactions on Nuclear Science, vol.44, issue.3, pp.1464-1468, 1997.

N. Halabi, Modern Standard Arabic Phonetics for Speech Synthesis. Dissertation, 2016.

M. Du-toit, The expression <img src="http://files.ithuta.net/OpenJournals/HTS/Archives/ID201.png"/> as the key to 1 Peter 2:1-3, HTS Teologiese Studies / Theological Studies, vol.63, issue.1, 2007.

, Chapter 2: Medieval Merlin: Advice, Merlin, pp.43-96, 2018.

A. Speech-corpus, , 2020.

. Wavenet, , 2020.

L. A. Thorpe and B. R. Shelton, Subjective Test Methodology: MOS vs. DMOS in Evaluation of Speech Coding Algorithms, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,, pp.73-74

S. Dimolitsas, F. L. Corcoran, and C. Ravishankar, Dependence of opinion scores on listening sets used in degradation category rating assessments, IEEE Transactions on Speech and Audio Processing, vol.3, issue.5, pp.421-424, 1995.

D. H. Klatt and W. E. Cooper, Perception of Segment Duration in Sentence Contexts, Communication and Cybernetics, pp.69-89, 1975.