Skip to Main content Skip to Navigation
Journal articles

Projection-based demixing of spatial audio

Abstract : We propose a method to unmix multichannel audio signals into their different constitutive spatial objects. To achievethis, we characterize an audio object through both a spatial and a spectro-temporal modelling. The particularity of the spatialmodel we pick is that it neither assumes an object has only one underlying source point, nor does it attempt to model the complexroom acoustics. Instead, it focuses on a listener perspective, and takes each object as the superposition of many contributionswith different incoming directions and inter-channel delays. Our spectro-temporal probabilistic model is based on the recentlyproposed α-harmonisable processes, which are adequate for signals with large dynamics, such as audio. Then, the mainoriginality of this work is to provide a new way to estimate and exploit inter-channel dependences of an object for the purposeof demixing. In the Gaussian α = 2 case, previous research focused on covariance structures. This approach is no longervalid for α < 2 where covariances are not defined. Instead, we show how simple linear combinations of the mixture channelscan be used to learn the model parameters, and the method we propose consists in pooling the estimates based on manyprojections to correctly account for the original multichannel audio. Intuitively, each such downmix of the mixture provides anew perspective where some objects are cancelled or enhanced. Finally, we also explain how to recover the different spatial audioobjects when all parameters have been computed. Performance of the method is illustrated on the separation of stereophonic musicsignals. Index Terms—source separation, probabilistic models, non-negative matrix factorization, musical source separation
Document type :
Journal articles
Complete list of metadatas

Cited literature [46 references]  Display  Hide  Download
Contributor : Antoine Liutkus <>
Submitted on : Tuesday, May 17, 2016 - 11:06:18 AM
Last modification on : Monday, December 14, 2020 - 3:41:50 PM
Long-term archiving on: : Friday, August 19, 2016 - 4:28:15 PM


Files produced by the author(s)



Derry Fitzgerald, Antoine Liutkus, Roland Badeau. Projection-based demixing of spatial audio. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2016, ⟨10.1109/TASLP.2016.2570945⟩. ⟨hal-01260588v2⟩



Record views


Files downloads