Skip to Main content Skip to Navigation

Some Contributions to Audio Source Separation and Diarisation of Multichannel Convolutive Mixtures

Dionyssos Kounades-Bastian 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology, LJK - Laboratoire Jean Kuntzmann
Abstract : In this thesis we address the problem of multichannel audio source separa- tion (MASS) for underdetermined convolutive mixtures through probabilistic modeling. We focus on three aspects of the problem and make three contri- butions. Firstly, inspired from the empirically well validated representation of an audio signal, that is know as local Gaussian signal model (LGM) with non-negative matrix factorization (NMF), we propose a Bayesian extension to this, that overcomes some of the limitations of the NMF. We incorporate this representation in a MASS framework and compare it with the state of the art in MASS, yielding promising results. Secondly, we study how to separate mix- tures of moving sources and/or of moving microphones. Movements make the acoustic path between sources and microphones become time-varying. Ad- dressing time-varying audio mixtures appears is not so popular in the MASS literature. Thus, we begin from a state of the art LGM-with-NMF method designed for separating time-invariant audio mixtures and propose an exten- sion that uses a Kalman smoother to track the acoustic path across time. The proposed method is benchmarked against a block-wise adaptation of that state of the art (ran on time segments), and delivers competitive results on both simulated and real-world mixtures. Lastly, we investigate the link between MASS and the task of audio diarisation. Audio diarisation is the detection of the time intervals where each speaker/source is active or silent. Most state of the art MASS methods consider the sources to emit continuously; A hypothe- sis that can result in spurious signal estimates for a source, in intervals where that source was silent. Our aim is that diarisation can aid MASS by indicat- ing the emitting sources at each time frame. To that extent we design a joint framework for simultaneous diarisation and MASS, that incorporates a hidden Markov model (HMM) to track the temporal activity of the sources, within a state of the art LGM-with-NMF MASS framework. We compare the proposed method with the state of the art in MASS and audio diarisation tasks. We ob- tain performances comparable, with the state of the art, in terms of separation while winning in terms of diarisation.
Complete list of metadata

Cited literature [49 references]  Display  Hide  Download
Contributor : Team Perception Connect in order to contact the contributor
Submitted on : Tuesday, June 20, 2017 - 2:55:18 PM
Last modification on : Wednesday, October 27, 2021 - 10:59:47 AM
Long-term archiving on: : Friday, December 15, 2017 - 8:25:50 PM


  • HAL Id : tel-01543101, version 1




Dionyssos Kounades-Bastian. Some Contributions to Audio Source Separation and Diarisation of Multichannel Convolutive Mixtures. Signal and Image Processing. Université Grenoble - Alpes, 2017. English. ⟨tel-01543101⟩



Record views


Files downloads