Detection and Localization of 3D Audio-Visual Objects Using Unsupervised Clustering

Vasil Khalidov 1 Florence Forbes 1 Miles Hansard 2 Elise Arnaud 2 Radu Horaud 2
1 MISTIS - Modelling and Inference of Complex and Structured Stochastic Systems
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : This paper addresses the issues of detecting and localizing objects in a scene that are both seen and heard. We explain the benefits of a human-like configuration of sensors (binaural and binocular) for gathering auditory and visual observations. It is shown that the detection and localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. We propose a probabilistic generative model that captures the relations between audio and visual observations. This model maps the data into a common audio-visual 3D representation via a pair of mixture models. Inference is performed by a version of the expectationmaximization algorithm, which is formally derived, and which provides cooperative estimates of both the auditory activity and the 3D position of each object. We describe several experiments with single- and multiple-speaker detection and localization, in the presence of other audio sources.
Type de document :
Communication dans un congrès
ICMI 2008 - ACM/IEEE International Conference on Multimodal Interfaces, Oct 2008, Chania, Greece. ACM, pp.217-224, 2008, 〈10.1145/1452392.1452438〉
Liste complète des métadonnées


https://hal.inria.fr/inria-00373148
Contributeur : Elise Arnaud <>
Soumis le : vendredi 3 avril 2009 - 14:41:31
Dernière modification le : mercredi 11 avril 2018 - 01:58:41
Document(s) archivé(s) le : jeudi 10 juin 2010 - 19:40:53

Identifiants

Citation

Vasil Khalidov, Florence Forbes, Miles Hansard, Elise Arnaud, Radu Horaud. Detection and Localization of 3D Audio-Visual Objects Using Unsupervised Clustering. ICMI 2008 - ACM/IEEE International Conference on Multimodal Interfaces, Oct 2008, Chania, Greece. ACM, pp.217-224, 2008, 〈10.1145/1452392.1452438〉. 〈inria-00373148〉

Partager

Métriques

Consultations de la notice

555

Téléchargements de fichiers

326