Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Simon Leglaive 1 Laurent Girin 2 Radu Horaud 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 GIPSA-CRISSP - CRISSP
GIPSA-DPC - Département Parole et Cognition
Abstract : In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://hal.inria.fr/hal-02005102
Contributor : Simon Leglaive <>
Submitted on : Tuesday, April 30, 2019 - 3:48:09 PM
Last modification on : Tuesday, September 10, 2019 - 10:58:49 AM

File

LGH-icassp2019.pdf
Files produced by the author(s)

Identifiers

Citation

Simon Leglaive, Laurent Girin, Radu Horaud. Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization. ICASSP 2019 - IEEE International Conference on Acoustics Speech and Signal Processing, May 2019, Brighton, United Kingdom. pp.101-105, ⟨10.1109/ICASSP.2019.8683704⟩. ⟨hal-02005102v2⟩

Share

Metrics

Record views

131

Files downloads

731