Audio source separation using sparse representations

Andrew Nesbit; Maria G. Jafari; Emmanuel Vincent; Mark D. Plumbley

doi:10.4018/978-1-61520-919-4.ch010

Chapitre D'ouvrage Année : 2010

Audio source separation using sparse representations

(1) , (1) , (2) , (1)

1
2

Andrew Nesbit

Fonction : Auteur

Centre for Digital Music

Maria G. Jafari

Fonction : Auteur

Centre for Digital Music

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech and sound data modeling and processing

Mark D. Plumbley

Fonction : Auteur

Centre for Digital Music

Résumé

We address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

nesbit_IGI10.pdf (255.53 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Vincent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00544030

Soumis le : samedi 10 décembre 2011-07:00:23

Dernière modification le : vendredi 24 mars 2023-14:52:53

Archivage à long terme le : dimanche 11 mars 2012-02:20:15

Dates et versions

inria-00544030 , version 1 (10-12-2011)

Identifiants

HAL Id : inria-00544030 , version 1
DOI : 10.4018/978-1-61520-919-4.ch010

Citer

Andrew Nesbit, Maria G. Jafari, Emmanuel Vincent, Mark D. Plumbley. Audio source separation using sparse representations. W. Wang. Machine Audition: Principles, Algorithms and Systems, IGI Global, pp.246--265, 2010, ⟨10.4018/978-1-61520-919-4.ch010⟩. ⟨inria-00544030⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

273 Consultations

761 Téléchargements

Audio source separation using sparse representations

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager