Finding Audio-Visual Events in Informal Social Gatherings

Xavier Alameda-Pineda 1 Vasil Khalidov 2 Radu Horaud 1 Florence Forbes 3
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
3 MISTIS - Modelling and Inference of Complex and Structured Stochastic Systems
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : In this paper we address the problem of detecting and localizing objects that can be both seen and heard, e.g., people. This may be solved within the framework of data clustering. We propose a new multimodal clustering algorithm based on a Gaussian mixture model, where one of the modalities (visual data) is used to supervise the clustering process. This is made possible by mapping both modalities into the same metric space. To this end, we fully exploit the geometric and physical properties of an audio-visual sensor based on binocular vision and binaural hearing. We propose an EM algorithm that is theoretically well justified, intuitive, and extremely efficient from a computational point of view. This efficiency makes the method implementable on advanced platforms such as humanoid robots. We describe in detail tests and experiments performed with publicly available data sets that yield very interesting results.
Document type :
Conference papers
ACM/IEEE International Conference on Multimodal Interaction, Nov 2011, Alicante, Spain. ACM, pp.247-254, 2011, <10.1145/2070481.2070527>
Liste complète des métadonnées



https://hal.inria.fr/inria-00623489
Contributor : Team Perception <>
Submitted on : Friday, March 21, 2014 - 10:17:02 AM
Last modification on : Sunday, July 20, 2014 - 9:44:13 PM
Document(s) archivé(s) le : Saturday, June 21, 2014 - 10:55:29 AM

Files

Alameda-ICMI-2011.pdf
Files produced by the author(s)

Identifiers

Citation

Xavier Alameda-Pineda, Vasil Khalidov, Radu Horaud, Florence Forbes. Finding Audio-Visual Events in Informal Social Gatherings. ACM/IEEE International Conference on Multimodal Interaction, Nov 2011, Alicante, Spain. ACM, pp.247-254, 2011, <10.1145/2070481.2070527>. <inria-00623489v2>

Share

Metrics

Record views

384

Document downloads

179