A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document

Manuel Pariente 1, 2 Antoine Deleforge 1, 2 Emmanuel Vincent 1, 2
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the encoder of the pre-learned VAE can be used to estimate the variational approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the aforementioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance.
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.inria.fr/hal-02089062
Contributor : Manuel Pariente <>
Submitted on : Monday, April 8, 2019 - 2:06:29 PM
Last modification on : Tuesday, April 30, 2019 - 4:33:57 PM

File

support_document_final.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02089062, version 1

Citation

Manuel Pariente, Antoine Deleforge, Emmanuel Vincent. A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document. [Research Report] RR-9268, INRIA. 2019, pp.1-8. ⟨hal-02089062⟩

Share

Metrics

Record views

210

Files downloads

422