Enhancement of esophageal speech using voice conversion techniques

Imen Ben Othmane 1, 2 Joseph Di Martino 2 Kaïs Ouni 1
2 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This paper presents a novel approach for enhancing esophageal speech using voice conversion techniques. Esophageal speech (ES) is an alternative voice that allows a patient with no vocal cords to produce sounds after total laryngectomy: this voice has a poor degree of intelligibility and a poor quality. To address this issue, we propose a speaking-aid system enhancing ES in order to clarify and make it more natural. Given the specificity of ES, in this study we propose to apply a new voice conversion technique taking into account the particularity of the pathological vocal apparatus. We trained deep neural networks (DNNs) and Gaussian mixture models (GMMs) to predict " laryngeal " vocal tract features from esophageal speech. The converted vectors are then used to estimate the excitation cepstral coefficients and phase by a search in the target training space previously encoded as a binary tree. The voice resynthesized sounds like a laryngeal voice i.e., is more natural than the original ES, with an effective reconstruction of the prosodic information while retaining , and this is the highlight of our study, the characteristics of the vocal tract inherent to the source speaker. The results of voice conversion evaluated using objective and subjective experiments , validate the proposed approach.
Type de document :
Communication dans un congrès
International Conference on Natural Language, Signal and Speech Processing - ICNLSSP 2017, Dec 2017, Casablanca, Morocco
Liste complète des métadonnées

Littérature citée [36 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01660580
Contributeur : Joseph Di Martino <>
Soumis le : lundi 11 décembre 2017 - 10:14:20
Dernière modification le : mardi 24 avril 2018 - 13:30:14

Fichier

TemplateICNLSSP_Vfinale (1).pd...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01660580, version 1

Citation

Imen Ben Othmane, Joseph Di Martino, Kaïs Ouni. Enhancement of esophageal speech using voice conversion techniques. International Conference on Natural Language, Signal and Speech Processing - ICNLSSP 2017, Dec 2017, Casablanca, Morocco. 〈hal-01660580〉

Partager

Métriques

Consultations de la notice

410

Téléchargements de fichiers

130