Finding Audio-Visual Events in Informal Social Gatherings

Xavier Alameda-Pineda; Vasil Khalidov; Radu Horaud; Florence Forbes

doi:10.1145/2070481.2070527

Conference Papers Year : 2011

Finding Audio-Visual Events in Informal Social Gatherings

(1) , (2) , (1) , (3)

1
2
3

Xavier Alameda-Pineda

Function : Author
PersonId : 16186
IdHAL : xavier-alameda-pineda
ORCID : 0000-0002-5354-1084
IdRef : 18450919X

Interpretation and Modelling of Images and Videos

Vasil Khalidov

Function : Author

IDIAP Research Institute

Radu Horaud

Function : Author
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Florence Forbes

Function : Author
PersonId : 16305
IdHAL : florence-forbes
ORCID : 0000-0003-3639-0226
IdRef : 12469781X

Modelling and Inference of Complex and Structured Stochastic Systems

Abstract

In this paper we address the problem of detecting and localizing objects that can be both seen and heard, e.g., people. This may be solved within the framework of data clustering. We propose a new multimodal clustering algorithm based on a Gaussian mixture model, where one of the modalities (visual data) is used to supervise the clustering process. This is made possible by mapping both modalities into the same metric space. To this end, we fully exploit the geometric and physical properties of an audio-visual sensor based on binocular vision and binaural hearing. We propose an EM algorithm that is theoretically well justified, intuitive, and extremely efficient from a computational point of view. This efficiency makes the method implementable on advanced platforms such as humanoid robots. We describe in detail tests and experiments performed with publicly available data sets that yield very interesting results.

Dans cet article nous abordons le problème de la détection et de localisation objets qui peuvent être à la fois vu et entendu, par exemple, les gens. Cela peut être résolu dans le cadre du regroupement des données. Nous proposons une nouvel algorithme de clustering multimodale basée sur un mélange de gaussiennes modèle, où l'une des modalités (visuelle de données) est utilisé pour de superétau le processus de regroupement. Ceci est rendu possible par la cartographie à la fois modalités dans le même espace métrique. À cette fin, nous avons pleinement exPloit les propriétés géométriques et physiques d'un audio-visuels senSOR repose sur la vision binoculaire et une audition inaurale. Nous proposons un algorithme EM qui est théoriquement tout à fait justifiée, intuitive et extrêmement efficace d'un point de vue informatique. Cette efficacité rend le implémentable méthode sur des plates-formes avancées tels que des robots humanoïdes. Nous décrivons dans les tests de détail et expéments effectués avec les données disponibles publiquement que les ensembles de rendement très des résultats intéressants.

Domains

Graphics [cs.GR]

Fichier principal

Alameda-ICMI-2011.pdf (857.62 Ko)

findingpeople.png (504.88 Ko)

Origin : Files produced by the author(s)

Format : Figure, Image

Perception team : Connect in order to contact the contributor

https://inria.hal.science/inria-00623489

Submitted on : Friday, March 21, 2014-10:17:02 AM

Last modification on : Thursday, April 4, 2024-9:06:52 PM

Long-term archiving on: Saturday, June 21, 2014-10:55:29 AM

Dates and versions

inria-00623489 , version 1 (14-09-2011)

inria-00623489 , version 2 (21-03-2014)

Identifiers

HAL Id : inria-00623489 , version 2
DOI : 10.1145/2070481.2070527

Cite

Xavier Alameda-Pineda, Vasil Khalidov, Radu Horaud, Florence Forbes. Finding Audio-Visual Events in Informal Social Gatherings. ACM/IEEE International Conference on Multimodal Interaction, Nov 2011, Alicante, Spain. pp.247-254, ⟨10.1145/2070481.2070527⟩. ⟨inria-00623489v2⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_PS LJK_GI_PERCEPTION LJK_PS_MISTIS INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

316 View

401 Download

Finding Audio-Visual Events in Informal Social Gatherings

Abstract

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share