A tractable framework for estimating and combining spectral source models for audio source separation

Simon Arberet; Alexey Ozerov; Frédéric Bimbot; Rémi Gribonval

doi:10.1016/j.sigpro.2011.12.022

Article Dans Une Revue Signal Processing Année : 2012

A tractable framework for estimating and combining spectral source models for audio source separation

(1) , (2) , (2) , (2)

1
2

Simon Arberet

Fonction : Auteur
PersonId : 882921

LTS2 - EPFL

Alexey Ozerov

Fonction : Auteur
PersonId : 888401

Speech and sound data modeling and processing

Frédéric Bimbot

Fonction : Auteur
PersonId : 830967

Speech and sound data modeling and processing

Rémi Gribonval

Fonction : Auteur
PersonId : 1255
IdHAL : remi-gribonval
ORCID : 0000-0002-9450-8125
IdRef : 113181590

Speech and sound data modeling and processing

Résumé

The underdetermined blind audio source separation (BSS) problem is often addressed in the time-frequency (TF) domain assuming that each TF point is modeled as an independent random variable with sparse distribution. On the other hand, methods based on structured spectral model, such as the Spectral Gaussian Scaled Mixture Models (Spectral-GSMMs) or Spectral Non-negative Matrix Factorization models, perform better because they exploit the statistical diversity of audio source spectrograms, thus allowing to go beyond the simple sparsity assumption. However, in the case of discrete state-based models, such as Spectral-GSMMs, learning the models from the mixture can be computationally very expensive. One of the main problems is that using a classical Expectation-Maximization procedure often leads to an exponential complexity with respect to the number of sources. In this paper, we propose a framework with a linear complexity to learn spectral source models (including discrete state-based models) from noisy source estimates. Moreover, this framework allows combining di erent probabilistic models that can be seen as a sort of probabilistic fusion. We illustrate that methods based on this framework can significantly improve the BSS performance compared to the state-of-the-art approaches.

Mots clés

Blind source separation multichannel audio Gaussian mixture model expectation-maximization algorithm convolutive mixture

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

manuscript.pdf (640.62 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alexey Ozerov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00694071

Soumis le : jeudi 3 mai 2012-14:58:19

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : samedi 4 août 2012-02:40:09

Dates et versions

hal-00694071 , version 1 (03-05-2012)

Identifiants

HAL Id : hal-00694071 , version 1
DOI : 10.1016/j.sigpro.2011.12.022

Citer

Simon Arberet, Alexey Ozerov, Frédéric Bimbot, Rémi Gribonval. A tractable framework for estimating and combining spectral source models for audio source separation. Signal Processing, 2012, 92 (8), pp.1886-1901. ⟨10.1016/j.sigpro.2011.12.022⟩. ⟨hal-00694071⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

429 Consultations

255 Téléchargements

A tractable framework for estimating and combining spectral source models for audio source separation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager