(. Doi, Statistical approach to enhancing esophageal speech based on gaussian mixture models, Dans les actes de Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp.4250-4253, 2010.

(. Doi, Alaryngeal speech enhancement based on one-to-many eigenvoice conversion, 2014.

, La respiration pour le chant, Speech, and Language Processing, vol.22, pp.172-183, 2004.

(. Erhan, Why does unsupervised pre-training help deep learning, Journal of Machine Learning Research, vol.11, pp.625-660, 2010.

(. Erro, Inca algorithm for training voice conversion systems from nonparallel corpora, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.5, pp.944-953, 2010.

(. Erro, Voice conversion based on weighted frequency warping, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.5, pp.922-931, 2010.

. Eslava, ;. D. Bilbao, . M. Eslava-&-a, and . Bilbao, Intra-lingual and crosslingual voice conversion using harmonic plus stochastic models, 2008.

. Espy-wilson, Enhancement of electrolaryngeal speech by adaptive filtering, Journal of Speech, Language, and Hearing Research, vol.41, issue.6, pp.1253-1264, 1998.

G. L. Flanagan, &. R. Flanagan, and . Golden, Phase vocoder, Bell System Technical Journal, vol.45, issue.9, pp.1493-1509, 1966.

(. García, Time-spectral technique for esophageal speech regeneration, les actes de 11th EUSIPCO (European Signal Processing Conference, pp.113-116, 2002.

(. García, Esophageal voices : Glottal flow restoration, les actes de Acoustics, Speech, and Signal Processing, vol.4, p.141, 2005.

;. E. George and . George, An analysis-by-synthesis approach to sinusoidal modeling applied to speech and music signal processing, 1991.

. George, ;. E. Smith, . J. George-&-m, and . Smith, Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Transactions on Speech and Audio Processing, vol.5, issue.5, pp.389-406, 1997.

V. Gnann, Signal Reconstruction from Multiresolution Magnitude Spectrograms for Audio, 2014.

(. Godoy, Alleviating the oneto-many mapping problem in voice conversion with context-dependent modelling, 2009.

, Dans les actes de InterSpeech 09 : 10th Annual Conference of the International Speech Communication Association

. Gold, ;. B. Rabiner, &. Gold, and . Rabiner, Parallel processing techniques for estimating pitch periods of speech in the time domain, The Journal of the Acoustical Society of America, vol.46, issue.2B, pp.442-448, 1969.

;. E. Gopi, . Gopi, . Griffin, ;. D. Lim, &. Griffin et al., Signal estimation from modified short-time fourier transform, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.2, pp.236-243, 1984.

. Hamon, A diphone synthesis system based on time-domain prosodic modifications of speech, Dans les actes de Acoustics, Speech, and Signal Processing, pp.238-241, 1989.

(. Helander, Lsf mapping for voice conversion with very small training sets, pp.4669-4672, 2008.

(. Helander, On the impact of alignment on voice conversion performance, les actes de Ninth Annual Conference of the International Speech Communication Association, 2008.

(. Helander, Voice conversion using dynamic kernel partial least squares regression, IEEE transactions on audio, speech, and language processing, vol.20, issue.3, pp.806-817, 2012.

(. Helander, Voice conversion using partial least squares regression, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.5, pp.912-921, 2010.

(. Hinton, Deep neural networks for acoustic modeling in speech recognition : The shared views of four research groups, IEEE Signal processing magazine, vol.29, issue.6, pp.82-97, 2012.

(. Hinton, A fast learning algorithm for deep belief nets, Neural computation, vol.18, issue.7, pp.1527-1554, 2006.

. Hinton, ;. G. Salakhutdinov, . R. Hinton-&-r, and . Salakhutdinov, Reducing the dimensionality of data with neural networks, science, vol.313, issue.5786, pp.504-507, 2006.

. Hisada, ;. A. Sawada, &. Hisada, and . Sawada, Real-time clarification of esophageal speech using a comb filter, les actes de Proc. ICDVRAT, pp.39-46, 2002.

;. R. Ishaq-et-zapirain, . G. Ishaq-&-b, and . Zapirain, Esophageal speech enhancement using modified voicing source. Dans les actes de Signal Processing and Information Technology (ISSPIT), IEEE International Symposium on, pp.210-000214, 2013.

F. Itakura, Line spectrum representation of linear predictor coefficients of speech signals, The Journal of the Acoustical Society of America, vol.57, issue.S1, pp.35-35, 1975.

J. C. Kahane, A morphological study of the human prepubertal and pubertal larynx, American Journal of Anatomy, vol.151, issue.1, pp.11-19, 1978.

;. A. Kain-et-macon, . W. Kain-&-m, and . Macon, Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction, Proceedings of the 1998 IEEE International Conference on, vol.1, pp.813-816, 1998.

;. A. Kain and . Kain, High resolution voice transformation, 2001.

(. Kawahara, Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight, les actes de Second International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2001.

(. Kawahara, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction : Possible role of a repetitive structure in sounds1, Speech communication, vol.27, issue.3-4, pp.187-207, 1999.

(. Kawanami, Gmm-based voice conversion applied to emotional speech synthesis, The cmu arctic speech databases. Dans les actes de Fifth ISCA workshop on speech synthesis, 2003.

, Elements of acoustic phonetics, 1996.

B. Laures, Perceptual effects of a flattened fundamental frequency at the sentence level under different listening conditions, Journal of communication disorders, vol.36, issue.6, pp.449-464, 2003.

. Laures, ;. J. Weismer, &. Laures, and . Weismer, The effects of a flattened fundamental frequency on intelligibility at the sentence level, Journal of Speech, Language, and Hearing Research, vol.42, issue.5, pp.1148-1156, 1999.

;. F. (le-huche-et-allali, &. A. Le-huche, and . Allali, La voix : anatomie et physiologie des organes de la voix et de la parole, vol.1, 2001.

. Ling, Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.10, pp.2129-2139, 2013.

. Liu, ;. H. Ng, . L. Liu-&-m, and . Ng, Electrolarynx in voice rehabilitation, Auris Nasus Larynx, vol.34, issue.3, pp.327-332, 2007.

(. Liu, Application of spectral subtraction method on enhancement of electrolarynx speech, The Journal of the Acoustical Society of America, vol.120, issue.1, pp.398-406, 2006.

(. Liu, Enhancement of electrolarynx speech based on auditory masking, IEEE Transactions on Biomedical Engineering, vol.53, issue.5, pp.865-874, 2006.

(. Lu, Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis, 2013.

Q. F. Machado, &. Machado, and . Queiroz, Voice conversion : A critical survey, Proc. Sound and Music Computing, pp.1-8, 2010.

. Mantilla-caeiros, A pattern recognition based esophageal speech enhancement system, Journal of applied research and technology, vol.8, issue.1, pp.56-70, 2010.

K. Matsui and . Matsui, Enhancement of esophageal speech using formant synthesis method, Proc. Spring Meet. Acoust. Soc. Jpn, vol.311, issue.2, pp.69-76, 1997.

;. S. Mattice and . Mattice, Why alaryngeal speech has a reduced level of intelligibility and how it can be maximized, 2015.

. Mcaulay, ;. R. Quatieri, &. Mcaulay, and . Quatieri, Speech analysis/synthesis based on a sinusoidal representation, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.34, issue.4, pp.744-754, 1986.

A. M. Noll, Pitch determination of human speech by the harmonic product spectrum, the harmonic surn spectrum, and a maximum likelihood estimate, 1970.

, Dans les actes de Symposium on Computer Processing in Communication, vol.19, pp.779-797

A. Oppenheim and &. Schafer, Homomorphic analysis of speech, IEEE Transactions on Audio and Electroacoustics, vol.16, issue.2, pp.221-226, 1968.

E. Pépiot, Voix de femmes, voix d'hommes : différences acoustiques, identification du genre par la voix et implications psycholinguistiques chez les locuteurs anglophones et francophones, Pernkopf, 1952) E. Pernkopf, 1952. Topographische Anatomie des Menschen : Der Hals/von Eduard Pernkopf, 2013.

(. Pravena, Pathological voice recognition for vocal fold disease, International Journal of Computer Applications, vol.47, issue.13, 2012.

Y. Qi, Replacing tracheoesophageal voicing sources using lpc synthesis, The Journal of the Acoustical Society of America, vol.88, issue.3, pp.1228-1235, 1990.

. Qi, ;. Y. Weinberg, &. Qi, and . Weinberg, Characteristics of voicing source waveforms produced by esophageal and tracheoesophageal speakers, Journal of Speech, Language, and Hearing Research, vol.38, issue.3, pp.536-548, 1995.

(. Qi, Enhancement of female esophageal and tracheoesophageal speech, The Journal of the Acoustical Society of America, vol.98, issue.5, pp.2461-2465, 1995.

. Quatieri, ;. T. Mcaulay, &. Quatieri, and . Mcaulay, Speech transformations based on a sinusoidal representation, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.34, issue.6, pp.1449-1464, 1986.

. Rabiner, ;. L. Schafer, . W. Rabiner-&-r, and . Schafer, Digital processing of speech signals, vol.100, 1978.

(. Rao, Voice transformation by mapping the features at syllable level, les actes de International Conference on Pattern Recognition and Machine Intelligence, pp.479-486, 2007.

(. Rumelhart, Learning representations by back-propagating errors, nature, vol.323, issue.6088, p.533, 1986.

;. M. Schroeder and . Schroeder, Period histogram and product spectrum : New methods for fundamental-frequency measurement, The Journal of the Acoustical Society of America, vol.43, issue.4, pp.829-834, 1968.

(. Sharifzadeh, Reconstruction of normal sounding speech for laryngectomy patients through a modified celp codec, IEEE Transactions on Biomedical Engineering, vol.57, issue.10, pp.2448-2458, 2010.

(. Srivastava, Dropout : a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

Y. Stylianou, Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification, 1996.

Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis, IEEE Transactions on speech and audio processing, vol.9, issue.1, pp.21-29, 2001.

Y. Stylianou, Voice transformation : a survey, les actes de Acoustics, Speech and Signal Processing, pp.3585-3588, 2009.

(. Stylianou, Continuous probabilistic transform for voice conversion, IEEE Transactions on speech and audio processing, vol.6, issue.2, pp.131-142, 1998.

(. Tanaka, A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation, IEICE TRANSACTIONS on Information and Systems, vol.97, issue.6, pp.1429-1437, 2014.

(. Toda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.8, pp.2222-2235, 2007.

(. Toda, The voice conversion challenge, pp.1632-1636, 2016.

H. Valbret, Systeme de conversion de voix pour la synthese de parole, 1993.

(. Valbret, Voice transformation using psola technique, Speech communication, vol.11, issue.2-3, pp.175-187, 1992.

(. Vincent, Stacked denoising autoencoders : Learning useful representations in a deep network with a local denoising criterion, Journal of machine learning research, vol.11, pp.3371-3408, 2010.

A. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE transactions on Information Theory, vol.13, issue.2, pp.260-269, 1967.

(. Watanabe, Transformation of spectral envelope for voice conversion based on radial basis function networks, Seventh International Conference on Spoken Language Processing, 2002.

;. Z. Wu and . Wu, Spectral mapping for voice conversion, 2015.

(. Ze, Statistical parametric speech synthesis using deep neural networks, Dans les actes de Acoustics, Speech and Signal Processing, pp.7962-7966, 2013.

(. Zhu, ENHANCEMENT OF ESOPHA-GEAL SPEECH USING STATISTICAL AND NEUROMIMETIC VOICE CONVERSION TECH-NIQUES, IEEE. Bibliographie personnelle Revues internationales avec comité de sélection, vol.1, p.10, 2006.

I. Ben-othmane, . Di, J. Martino, and K. Ouni, ENHANCEMENT OF ESOPHA-GEAL SPEECH OBTAINED BY A VOICE CONVERSION TECHNIQUE USING TIME DILATED FOURIER CEPSTRA, INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, pp.1-12, 2018.

C. Ben-othmane, I. Di, J. Martino, K. ;. Ouni, J. ). Vers et al., , 2017.

I. Ben-othmane, . Di, J. Martino, and K. Ouni, ENHANCEMENT OF ESOPHAGEAL SPEECH USING VOICE CONVERSION TECHNIQUES, INTERNATIO-NAL CONFERENCE ON NATURAL LANGUAGE, SIGNAL AND SPEECH PROCESSING, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01660580

I. Ben-othmane, . Di, J. Martino, and K. Ouni, DECEMBER). IMPROVING THE COMPUTATIONAL PERFORMANCE OF STANDARD GMM-BASED VOICE CONVER-SION SYSTEMS USED IN REAL-TIME APPLICATIONS, 2018 INTERNATIONAL CONFE-RENCE ON ELECTRONICS, CONTROL, OPTIMIZATION AND COMPUTER SCIENCE, pp.1-5, 2018.

B. Journées-nationales, I. Othmane, K. ;. Ouni, . Transformation-de-la, and . Voix,

P. Approches-et-applications and . Tunis, JDEPT), 2015.

I. Ben-othmane and K. Ouni, 24-26 NOVEMBRE 2016) STUDY, IMPLEMENTATION AND APPLICATION OF A VOICE CONVERSION SYSTEM, 2016.