Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

Xiaofei Li 1 Yutong Ban 1 Laurent Girin 2, 1 Xavier Alameda-Pineda 1 Radu Horaud 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 GIPSA-CRISSP - CRISSP
GIPSA-DPC - Département Parole et Cognition
Abstract : This paper addresses the problem of online multiple-speaker localization and tracking in reverberant environment. We propose to use the direct-path relative transfer function (DP-RTF) – a feature that encodes the inter-channel direct-path information robust against reverberation, hence well suited for reliable localization. A complex Gaussian mixture model (CGMM) is then used, such that each component weight represents the probability that an active speaker is present at a corresponding candidate source direction. Exponentiated gradient descent is used to update these weights online by minimizing a combination of negative log-likelihood and entropy. The latter imposes sparsity over the number of audio sources, since in practice only a few speakers are simultaneously active. The outputs of this online localization process are then used as observations within a Bayesian filtering process whose computation is made tractable via an instance of variational expectation-maximization. Birth and sleeping processes are used to account for the intermittent nature of speech. The method is thoroughly evaluated using several datasets.
Complete list of metadatas

Cited literature [38 references]  Display  Hide  Download

https://hal.inria.fr/hal-01851985
Contributor : Team Perception <>
Submitted on : Tuesday, July 31, 2018 - 1:49:18 PM
Last modification on : Thursday, March 7, 2019 - 11:40:05 AM
Long-term archiving on : Thursday, November 1, 2018 - 2:09:22 PM

File

SSLT_JSTSP.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01851985, version 1
  • ARXIV : 1809.10936

Citation

Xiaofei Li, Yutong Ban, Laurent Girin, Xavier Alameda-Pineda, Radu Horaud. Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments. IEEE Journal of Selected Topics in Signal Processing, IEEE, 2019. ⟨hal-01851985v1⟩

Share

Metrics

Record views

603

Files downloads

108