H. Doi, K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, Statistical approach to enhancing esophageal speech based on Gaussian mixture models, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4250-4253, 2010.
DOI : 10.1109/ICASSP.2010.5495676

H. Doi, K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, Enhancement of Esophageal Speech Using Statistical Voice Conversion, APSIPA 2009, pp.805-808, 2009.

K. Matsui, N. Hara, N. Kobayashi, and H. Hirose, Enhancement of esophageal speech using formant synthesis, Proc. ICASSP, pp.1831-1834, 1999.

A. Hisada and H. Sawada, Real-time clarification of esophageal speech using a comb filter, Proc. ICDVRAT, 2002.

T. Toda, A. Black, and K. Tokuda, Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.8, pp.2222-2235, 2007.
DOI : 10.1109/TASL.2007.907344

Y. Stylianou, O. Cappé, and E. Moulines, Continuous probabilistic transform for voice conversion, IEEE Transactions on Speech and Audio Processing, vol.6, issue.2, pp.131-142, 1998.
DOI : 10.1109/89.661472

N. Bi and Y. Qi, Application of speech conversion to alaryngeal speech enhancement, IEEE Transactions on Speech and Audio Processing, vol.5, issue.2, pp.97-105, 1997.

Y. Qi and B. Weinberg, Characteristics of Voicing Source Waveforms Produced by Esophageal and Tracheoesophageal Speakers, Journal of Speech Language and Hearing Research, vol.38, issue.3, p.536, 1995.
DOI : 10.1044/jshr.3803.536

Y. Qi, Replacing tracheoesophageal voicing sources using LPC synthesis, The Journal of the Acoustical Society of America, vol.88, issue.3, pp.1228-1235, 1990.
DOI : 10.1121/1.399700

Y. Qi, B. Weinberg, and N. Bi, Enhancement of female esophageal and tracheoesophageal speech, The Journal of the Acoustical Society of America, vol.98, issue.5, pp.2461-2465, 1995.
DOI : 10.1121/1.413279

A. Del-pozo and S. Young, Continuous Tracheoesophageal Speech Repair, Proc. EUSIPCO, 2006.

H. I. Trkmen and M. E. , Reconstruction of Dysphonic Speech by MELP, Iberoamerican Congress on Pattern Recognition, 2008.
DOI : 10.1109/ICDSP.1997.628419

H. Sharifzadeh, I. Mcloughlin, and F. Ahmadi, Reconstruction of Normal Sounding Speech for Laryngectomy Patients Through a Modified CELP Codec, IEEE Transactions on Biomedical Engineering, vol.57, issue.10, pp.2448-2458, 2010.
DOI : 10.1109/TBME.2010.2053369

D. Cole, S. Sridharan, M. Moody, and S. Geva, Application of noise reduction techniques for alaryngeal speech enhancement, TENCON '97 Brisbane, Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162), pp.491-494, 1997.
DOI : 10.1109/TENCON.1997.648252

B. García, J. Vicente, and E. Aramendi, Time-spectral technique for esophageal speech regeneration, p.11, 2002.

A. Del-pozo and S. Young, Repairing Tracheoesophageal Speech Duration, Proc. Speech Prosody, 2008.

K. Tanaka, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation, IEICE Transactions on Information and Systems, vol.97, issue.6, p.14291437, 2014.
DOI : 10.1587/transinf.E97.D.1429

H. Doi, Alaryngeal speech enhancement based on one-tomany eigenvoice conversion, IEEE/ACM Transactions on Audio , Speech, and Language Processing, vol.221, pp.172-183, 2014.

F. Xie, Y. Qian, Y. Fan, F. K. Soong, and H. Li, Sequence error (SE) minimization training of neural network for voice conversion, Proc. Interspeech, 2014.

D. Srinivas, E. V. Raghavendra, B. Yegnanarayana, A. W. Black, and K. Prahallad, Voice conversion using artificial neural networks, Proc. IEEE Int. Conf. Acoust. Speech Signal Process, pp.3893-3896, 2009.

L. Chen, Z. Ling, L. Liu, and L. Dai, Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.12, pp.1859-1872, 2014.
DOI : 10.1109/TASLP.2014.2353991

T. Nakashika, T. Takiguchi, and Y. Ariki, Voice Conversion Using RNN Pre-Trained by Recurrent Temporal Restricted Boltzmann Machines, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.3, pp.580-587, 2015.
DOI : 10.1109/TASLP.2014.2379589

T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, Voice conversion in high-order eigen space using deep belief nets, pp.369-372, 2013.

H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.26, issue.1, pp.43-49, 1978.
DOI : 10.1109/TASSP.1978.1163055

H. Valbret, Système de conversion de voix pour la synthèse de parole, 1993.

Z. Ling, S. Kang, H. Zen, A. Senior, M. Schuster et al., Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends, IEEE Signal Processing Magazine, vol.32, issue.3, pp.35-52, 2015.
DOI : 10.1109/MSP.2014.2359987

S. Arya, Nearest neighbor searching and applications, 1995.

M. M. Deza and E. Deza, Encyclopedia of Distances, 2009.

S. Desai, A. Black, B. Yegnanarayana, and K. Prahallad, Spectral Mapping Using Artificial Neural Networks for Voice Conversion, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.5, pp.954-964, 2010.
DOI : 10.1109/TASL.2010.2047683

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th international conference on machine learning (ICML-10, 2010.

A. L. Maas, A. Y. Hannun, and A. Y. Ng, Rectifier nonlinearities improve neural network acoustic models, Proc. ICML, 2013.

K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.123

M. Muller, Information retrieval for music and motion, 2007.
DOI : 10.1007/978-3-540-74048-3

K. Zhou, Q. Hou, R. Wang, and B. Guo, Real-time KD-tree construction on graphics hardware, ACM Transactions on Graphics, vol.27, issue.5, p.1, 2008.

L. Sun, S. Kang, K. Li, and H. Meng, Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7178896

N. Srivastava, Dropout: a simple way to prevent neural networks from overfitting, Journal of machine learning research, vol.151, pp.1929-1958, 2014.