A variance modeling framework based on variational autoencoders for speech enhancement

Simon Leglaive; Laurent Girin; Radu Horaud

doi:10.1109/MLSP.2018.8516711

Communication Dans Un Congrès Année : 2018

A variance modeling framework based on variational autoencoders for speech enhancement

(1) , (2) , (1)

1
2

Simon Leglaive

Fonction : Auteur
PersonId : 20853
IdHAL : simon-leglaive
ORCID : 0000-0002-8219-1298
IdRef : 25312171X

Interpretation and Modelling of Images and Videos

Laurent Girin

Fonction : Auteur
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach. We explore the use of neural networks as an alternative to a popular speech variance model based on supervised non-negative matrix factorization (NMF). More precisely, we use a variational autoencoder as a speaker-independent supervised generative speech model, highlighting the conceptual similarities that this approach shares with its NMF-based counterpart. In order to be free of generalization issues regarding the noisy recording environments, we follow the approach of having a supervised model only for the target speech signal, the noise model being based on unsupervised NMF. We develop a Monte Carlo expectation-maximization algorithm for inferring the latent variables in the variational autoencoder and estimating the unsupervised model parameters. Experiments show that the proposed method outperforms a semi-supervised NMF baseline and a state-of-the-art fully supervised deep learning approach.

Mots clés

non-negative matrix factorization Audio source separation Monte Carlo expectation-maximization variational autoencoders speech enhancement

Domaines

Traitement du signal et de l'image [eess.SP] Réseau de neurones [cs.NE]

Fichier principal

LGH_MLSP2018_final.pdf (759.77 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Simon Leglaive : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01832826

Soumis le : jeudi 12 juillet 2018-11:24:57

Dernière modification le : jeudi 4 avril 2024-21:30:05

Archivage à long terme le : lundi 15 octobre 2018-22:45:33

Dates et versions

hal-01832826 , version 1 (12-07-2018)

Identifiants

HAL Id : hal-01832826 , version 1
ARXIV : 1902.01605
DOI : 10.1109/MLSP.2018.8516711

Citer

Simon Leglaive, Laurent Girin, Radu Horaud. A variance modeling framework based on variational autoencoders for speech enhancement. MLSP 2018 - IEEE 28th International Workshop on Machine Learning for Signal Processing, Sep 2018, Aalborg, Denmark. pp.1-6, ⟨10.1109/MLSP.2018.8516711⟩. ⟨hal-01832826⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA GIPSA GIPSA-DPC LJK LJK_GI LJK_GI_PERCEPTION GIPSA-CRISSP INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

414 Consultations

1471 Téléchargements

A variance modeling framework based on variational autoencoders for speech enhancement

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager