Skip to Main content Skip to Navigation
Journal articles

A novel voice conversion approach using cascaded powerful cepstrum predictors with excitation and phase extracted from the target training space encoded as a KD-tree

Abstract : Voice conversion is an important problem in audio signal processing. The goal of voice conversion is to transform the speech signal of a source speaker such that it sounds as if it had been uttered by a target speaker. Our contribution in this paper includes a new methodology for designing the relationship between two sets of spectral envelopes. Our systems perform by: (1) cascading deep neural networks and Gaussian mixture model to construct DNN–GMM and GMM–DNN–GMM models in order to find a global mapping relationship between the cepstral vectors of the two speakers; (2) using a new spectral synthesis process with cascaded cepstrum predictors and excitation and phase extracted from the target training space encoded as a KD-tree. Experimental results of the proposed methods exhibit a great improvement of the intelligibility, the quality and naturalness of the converted speech signals when compared with stimuli obtained by baseline conversion methods. The extraction of excitation and phase from the target training space, permits the preservation of target speaker’s identity.
Document type :
Journal articles
Complete list of metadata

https://hal.inria.fr/hal-02315052
Contributor : Joseph Di Martino <>
Submitted on : Monday, October 14, 2019 - 11:09:21 AM
Last modification on : Tuesday, June 16, 2020 - 11:28:03 AM

Identifiers

Collections

Citation

Imen Ben Othmane, Joseph Di Martino, Kais Ouni. A novel voice conversion approach using cascaded powerful cepstrum predictors with excitation and phase extracted from the target training space encoded as a KD-tree. International Journal of Speech Technology, Springer Verlag, 2019, pp.1-13. ⟨10.1007/s10772-019-09643-4⟩. ⟨hal-02315052⟩

Share

Metrics

Record views

85