Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

Imen Ben Othmane; Joseph Di Martino; Kaïs Ouni

doi:10.1007/s10772-018-09579-1

Article Dans Une Revue International Journal of Speech Technology Année : 2018

Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

(1, 2) , (2) , (1)

1
2

Imen Ben Othmane

Fonction : Auteur

Unité de Recherche Systèmes Mécatroniques et Signaux

Statistical Machine Translation and Speech Modelization and Text

Joseph Di Martino

Fonction : Auteur
PersonId : 16557
IdHAL : joseph-di-martino
IdRef : 179331531

Statistical Machine Translation and Speech Modelization and Text

Kaïs Ouni

Fonction : Auteur

Unité de Recherche Systèmes Mécatroniques et Signaux

Résumé

This paper presents a novel speaking-aid system for enhancing esophageal speech (ES). The method adopted in this paper aims to improve the quality of esophageal speech using a combination of a voice conversion technique and a time dilation algorithm. In the proposed system, a Deep Neural Network (DNN) is used as a nonlinear mapping function for vocal tract vector transformation. Then the converted frames are used to determine realistic excitation and phase vectors from the target training space using a frame selection algorithm. Next, in order to preserve speaker identity of the esophageal speakers, we use the source vocal tract features and propose to apply on them a time dilation algorithm to reduce the unpleasant esophageal noises. Finally the converted speech is reconstructed using the dilated source vocal tract frames and the predicted excitation and phase. Deep Neural Network (DNN) and Gaussian Mixture model (GMM) based voice conversion systems have been evaluated using objective and subjective measures. Such an experimental study has been realized also in order to evaluate the changes in speech quality and intelligibility of the transformed signals. Experimental results demonstrate that the proposed methods provide considerable improvement in intelligibility and naturalness of the converted esophageal speech.

Mots clés

Esophageal speech voice conversion deep neural networks time dilation algorithm noise reduction excitation and phase Gaussian Mixture model

Domaines

Traitement du signal et de l'image [eess.SP]

Joseph Di Martino : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01954096

Soumis le : jeudi 13 décembre 2018-14:35:25

Dernière modification le : mercredi 13 septembre 2023-11:08:04

Dates et versions

hal-01954096 , version 1 (13-12-2018)

Identifiants

HAL Id : hal-01954096 , version 1
DOI : 10.1007/s10772-018-09579-1

Citer

Imen Ben Othmane, Joseph Di Martino, Kaïs Ouni. Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra. International Journal of Speech Technology, 2018, 22 (1), pp.99-110. ⟨10.1007/s10772-018-09579-1⟩. ⟨hal-01954096⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD

72 Consultations

0 Téléchargements

Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager