Enhancement of esophageal speech using voice conversion techniques

Imen Ben Othmane; Joseph Di Martino; Kaïs Ouni

Communication Dans Un Congrès Année : 2017

Enhancement of esophageal speech using voice conversion techniques

(1, 2) , (2) , (1)

1
2

Imen Ben Othmane

Fonction : Auteur

Unité de Recherche Systèmes Mécatroniques et Signaux

Statistical Machine Translation and Speech Modelization and Text

Joseph Di Martino

Fonction : Auteur
PersonId : 16557
IdHAL : joseph-di-martino
IdRef : 179331531

Statistical Machine Translation and Speech Modelization and Text

Kaïs Ouni

Fonction : Auteur

Unité de Recherche Systèmes Mécatroniques et Signaux

Résumé

This paper presents a novel approach for enhancing esophageal speech using voice conversion techniques. Esophageal speech (ES) is an alternative voice that allows a patient with no vocal cords to produce sounds after total laryngectomy: this voice has a poor degree of intelligibility and a poor quality. To address this issue, we propose a speaking-aid system enhancing ES in order to clarify and make it more natural. Given the specificity of ES, in this study we propose to apply a new voice conversion technique taking into account the particularity of the pathological vocal apparatus. We trained deep neural networks (DNNs) and Gaussian mixture models (GMMs) to predict " laryngeal " vocal tract features from esophageal speech. The converted vectors are then used to estimate the excitation cepstral coefficients and phase by a search in the target training space previously encoded as a binary tree. The voice resynthesized sounds like a laryngeal voice i.e., is more natural than the original ES, with an effective reconstruction of the prosodic information while retaining , and this is the highlight of our study, the characteristics of the vocal tract inherent to the source speaker. The results of voice conversion evaluated using objective and subjective experiments , validate the proposed approach.

Mots clés

Esophageal speech KD-Tree phase excitation Gaussian mixture model deep neural network

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

TemplateICNLSSP_Vfinale (1).pdf (388.53 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Joseph Di Martino : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01660580

Soumis le : lundi 11 décembre 2017-10:14:20

Dernière modification le : mercredi 13 septembre 2023-11:08:04

Dates et versions

hal-01660580 , version 1 (11-12-2017)

Identifiants

HAL Id : hal-01660580 , version 1

Citer

Imen Ben Othmane, Joseph Di Martino, Kaïs Ouni. Enhancement of esophageal speech using voice conversion techniques. International Conference on Natural Language, Signal and Speech Processing - ICNLSSP 2017, Dec 2017, Casablanca, Morocco. ⟨hal-01660580⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD

574 Consultations

366 Téléchargements

Enhancement of esophageal speech using voice conversion techniques

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager