Audio-Visual Robot Command Recognition

Jordi Sanchez-Riera; Xavier Alameda-Pineda; Radu Horaud

doi:10.1145/2388676.2388760

Communication Dans Un Congrès Année : 2012

Audio-Visual Robot Command Recognition

(1) , (1) , (1)

Jordi Sanchez-Riera

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Xavier Alameda-Pineda

Fonction : Auteur correspondant
PersonId : 16186
IdHAL : xavier-alameda-pineda
ORCID : 0000-0002-5354-1084
IdRef : 18450919X

Connectez-vous pour contacter l'auteur

Interpretation and Modelling of Images and Videos

Radu Horaud

Fonction : Auteur correspondant
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Connectez-vous pour contacter l'auteur

Interpretation and Modelling of Images and Videos

Résumé

This paper addresses the problem of audio-visual command recognition in the framework of the D-META Grand Challenge. Temporal and non-temporal learning models are trained on visual and auditory descriptors. In order to set a proper baseline, the methods are tested on the ''Robot Gestures'' scenario of the publicly available RAVEL data set, following the leave-one-out cross-validation strategy. The classification-level audio-visual fusion strategy allows for compensating the errors of the unimodal (audio or vision) classifiers. The obtained results (an average audio-visual recognition rate of almost 80%) encourage us to investigate on how to further develop and improve the methodology described in this paper.

Mots clés

audio-visual categorization multimodal learning

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

gcp03-pineda.pdf (259.03 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00768761

Soumis le : dimanche 23 décembre 2012-19:16:30

Dernière modification le : jeudi 4 avril 2024-21:15:46

Archivage à long terme le : dimanche 24 mars 2013-03:50:57

Dates et versions

hal-00768761 , version 1 (23-12-2012)

Identifiants

HAL Id : hal-00768761 , version 1
DOI : 10.1145/2388676.2388760

Citer

Jordi Sanchez-Riera, Xavier Alameda-Pineda, Radu Horaud. Audio-Visual Robot Command Recognition. ICMI 2012 - 14th ACM International Conference on Multimodal Interaction, Oct 2012, Santa-Monica, CA, United States. pp.371-378, ⟨10.1145/2388676.2388760⟩. ⟨hal-00768761⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_GI_PERCEPTION INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

166 Consultations

222 Téléchargements

Audio-Visual Robot Command Recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager