Skip to Main content Skip to Navigation
Journal articles

Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

Imen Ben Othmane 1, 2 Joseph Di Martino 2 Kaïs Ouni 1 
1 SMS - Unité de Recherche Systèmes Mécatroniques et Signaux
Université de Carthage - University of Carthage
2 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This paper presents a novel speaking-aid system for enhancing esophageal speech (ES). The method adopted in this paper aims to improve the quality of esophageal speech using a combination of a voice conversion technique and a time dilation algorithm. In the proposed system, a Deep Neural Network (DNN) is used as a nonlinear mapping function for vocal tract vector transformation. Then the converted frames are used to determine realistic excitation and phase vectors from the target training space using a frame selection algorithm. Next, in order to preserve speaker identity of the esophageal speakers, we use the source vocal tract features and propose to apply on them a time dilation algorithm to reduce the unpleasant esophageal noises. Finally the converted speech is reconstructed using the dilated source vocal tract frames and the predicted excitation and phase. Deep Neural Network (DNN) and Gaussian Mixture model (GMM) based voice conversion systems have been evaluated using objective and subjective measures. Such an experimental study has been realized also in order to evaluate the changes in speech quality and intelligibility of the transformed signals. Experimental results demonstrate that the proposed methods provide considerable improvement in intelligibility and naturalness of the converted esophageal speech.
Document type :
Journal articles
Complete list of metadata
Contributor : Joseph Di Martino Connect in order to contact the contributor
Submitted on : Thursday, December 13, 2018 - 2:35:25 PM
Last modification on : Wednesday, November 3, 2021 - 7:57:46 AM




Imen Ben Othmane, Joseph Di Martino, Kaïs Ouni. Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra. International Journal of Speech Technology, Springer Verlag, 2018, 22 (1), pp.99-110. ⟨10.1007/s10772-018-09579-1⟩. ⟨hal-01954096⟩



Record views