Projection-based demixing of spatial audio - Archive ouverte HAL Access content directly
Journal Articles IEEE Transactions on Audio, Speech and Language Processing Year : 2016

Projection-based demixing of spatial audio

(1) , (2) , (3)
1
2
3

Abstract

We propose a method to unmix multichannel audio signals into their different constitutive spatial objects. To achievethis, we characterize an audio object through both a spatial and a spectro-temporal modelling. The particularity of the spatialmodel we pick is that it neither assumes an object has only one underlying source point, nor does it attempt to model the complexroom acoustics. Instead, it focuses on a listener perspective, and takes each object as the superposition of many contributionswith different incoming directions and inter-channel delays. Our spectro-temporal probabilistic model is based on the recentlyproposed α-harmonisable processes, which are adequate for signals with large dynamics, such as audio. Then, the mainoriginality of this work is to provide a new way to estimate and exploit inter-channel dependences of an object for the purposeof demixing. In the Gaussian α = 2 case, previous research focused on covariance structures. This approach is no longervalid for α < 2 where covariances are not defined. Instead, we show how simple linear combinations of the mixture channelscan be used to learn the model parameters, and the method we propose consists in pooling the estimates based on manyprojections to correctly account for the original multichannel audio. Intuitively, each such downmix of the mixture provides anew perspective where some objects are cancelled or enhanced. Finally, we also explain how to recover the different spatial audioobjects when all parameters have been computed. Performance of the method is illustrated on the separation of stereophonic musicsignals. Index Terms—source separation, probabilistic models, non-negative matrix factorization, musical source separation
Fichier principal
Vignette du fichier
projection-based_separation_V3C.pdf (430.29 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01260588 , version 1 (22-01-2016)
hal-01260588 , version 2 (17-05-2016)

Identifiers

Cite

Derry Fitzgerald, Antoine Liutkus, Roland Badeau. Projection-based demixing of spatial audio. IEEE Transactions on Audio, Speech and Language Processing, 2016, ⟨10.1109/TASLP.2016.2570945⟩. ⟨hal-01260588v2⟩
673 View
713 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More