Audio-Visual Clustering for Multiple Speaker Localization

Vasil Khalidov 1 Florence Forbes 1 Miles Hansard 2 Elise Arnaud 2 Radu Horaud 2
1 MISTIS - Modelling and Inference of Complex and Structured Stochastic Systems
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : We address the issue of identifying and localizing individuals in a scene that contains several people engaged in conversation. We use a human-like configuration of sensors (binaural and binocular) to gather both auditory and visual observations.We show that the identification and localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. We propose a probabilistic generative model that captures the relations between audio and visual observations. This model maps the data to a representation of the common 3D scene-space, via a pair of Gaussian mixture models. Inference is performed by a version of the Expectation Maximization algorithm, which provides cooperative estimates of both the activity and the 3D position of each speaker.
Document type :
Conference papers
Complete list of metadatas


https://hal.inria.fr/inria-00373154
Contributor : Elise Arnaud <>
Submitted on : Friday, April 3, 2009 - 2:46:46 PM
Last modification on : Wednesday, April 11, 2018 - 1:57:51 AM
Long-term archiving on : Thursday, June 10, 2010 - 7:41:09 PM

Files

mlmi2008.pdf
Files produced by the author(s)

Identifiers

Citation

Vasil Khalidov, Florence Forbes, Miles Hansard, Elise Arnaud, Radu Horaud. Audio-Visual Clustering for Multiple Speaker Localization. MLMI 2008 - International Workshop on Machine Learning for Multimodal Interaction, Sep 2008, Utrecht, Netherlands. pp.86-97, ⟨10.1007/978-3-540-85853-9_8⟩. ⟨inria-00373154⟩

Share

Metrics

Record views

496

Files downloads

208