Skip to Main content Skip to Navigation
Conference papers

Improving the computational performance of standard GMM-based voice conversion systems used in real-time applications

Imen Ben Othmane 1, 2 Joseph Di Martino 2 Kaïs Ouni 1 
1 SMS - Unité de Recherche Systèmes Mécatroniques et Signaux
Université de Carthage - University of Carthage
2 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Voice conversion (VC) can be described as finding a mapping function which transforms the features extracted from a source speaker to those of a target speaker. Gaussian mixture model (GMM) based conversion is the most commonly used technique in VC, but is often sensitive to overfitting and oversmoothing. To address these issues, we propose a secondary classification by applying a K-means classification in each class obtained by a primary classification in order to obtain more precise local conversion functions. This proposal avoids the need for complex training algorithms because the local mapping functions are determined at the same time. The proposed approach consists of a Fourier cepstral analysis, followed by a training phase in order to find the local mapping functions which transform the vocal tract characteristics of the source speaker into those of the target speaker. The converted parameters together with excitation and phase extracted from the target training space using a frame index selection are used in the synthesis step to generate a converted speech with target speech characteristics. Objective and subjective experiments prove that the proposed technique outperforms the baseline GMM approach while greatly reducing the training and transformation computation times.
Document type :
Conference papers
Complete list of metadata
Contributor : Joseph Di Martino Connect in order to contact the contributor
Submitted on : Tuesday, October 2, 2018 - 3:35:54 PM
Last modification on : Wednesday, November 3, 2021 - 7:09:25 AM




Imen Ben Othmane, Joseph Di Martino, Kaïs Ouni. Improving the computational performance of standard GMM-based voice conversion systems used in real-time applications. ICECOCS’18 - 1st International Conference on Electronics, Control, Optimization and Computer Science, Dec 2018, Kenitra, Morocco. ⟨10.1109/ICECOCS.2018.8610514⟩. ⟨hal-01886099⟩



Record views