A tractable framework for estimating and combining spectral source models for audio source separation

Simon Arberet 1 Alexey Ozerov 2 Frédéric Bimbot 2 Rémi Gribonval 2
2 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The underdetermined blind audio source separation (BSS) problem is often addressed in the time-frequency (TF) domain assuming that each TF point is modeled as an independent random variable with sparse distribution. On the other hand, methods based on structured spectral model, such as the Spectral Gaussian Scaled Mixture Models (Spectral-GSMMs) or Spectral Non-negative Matrix Factorization models, perform better because they exploit the statistical diversity of audio source spectrograms, thus allowing to go beyond the simple sparsity assumption. However, in the case of discrete state-based models, such as Spectral-GSMMs, learning the models from the mixture can be computationally very expensive. One of the main problems is that using a classical Expectation-Maximization procedure often leads to an exponential complexity with respect to the number of sources. In this paper, we propose a framework with a linear complexity to learn spectral source models (including discrete state-based models) from noisy source estimates. Moreover, this framework allows combining di erent probabilistic models that can be seen as a sort of probabilistic fusion. We illustrate that methods based on this framework can significantly improve the BSS performance compared to the state-of-the-art approaches.
Type de document :
Article dans une revue
Signal Processing, Elsevier, 2012, 92 (8), pp.1886-1901. 〈10.1016/j.sigpro.2011.12.022〉
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00694071
Contributeur : Alexey Ozerov <>
Soumis le : jeudi 3 mai 2012 - 14:58:19
Dernière modification le : jeudi 11 janvier 2018 - 06:20:09
Document(s) archivé(s) le : samedi 4 août 2012 - 02:40:09

Fichier

manuscript.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Simon Arberet, Alexey Ozerov, Frédéric Bimbot, Rémi Gribonval. A tractable framework for estimating and combining spectral source models for audio source separation. Signal Processing, Elsevier, 2012, 92 (8), pp.1886-1901. 〈10.1016/j.sigpro.2011.12.022〉. 〈hal-00694071〉

Partager

Métriques

Consultations de la notice

1107

Téléchargements de fichiers

203