Skip to Main content Skip to Navigation
New interface
Conference papers

Audio-Visual Robot Command Recognition

Jordi Sanchez-Riera 1 Xavier Alameda-Pineda 1, * Radu Horaud 1, * 
* Corresponding author
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract : This paper addresses the problem of audio-visual command recognition in the framework of the D-META Grand Challenge. Temporal and non-temporal learning models are trained on visual and auditory descriptors. In order to set a proper baseline, the methods are tested on the ''Robot Gestures'' scenario of the publicly available RAVEL data set, following the leave-one-out cross-validation strategy. The classification-level audio-visual fusion strategy allows for compensating the errors of the unimodal (audio or vision) classifiers. The obtained results (an average audio-visual recognition rate of almost 80%) encourage us to investigate on how to further develop and improve the methodology described in this paper.
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Perception team Connect in order to contact the contributor
Submitted on : Sunday, December 23, 2012 - 7:16:30 PM
Last modification on : Thursday, May 5, 2022 - 3:11:27 AM
Long-term archiving on: : Sunday, March 24, 2013 - 3:50:57 AM


Files produced by the author(s)




Jordi Sanchez-Riera, Xavier Alameda-Pineda, Radu Horaud. Audio-Visual Robot Command Recognition. ICMI 2012 - 14th ACM International Conference on Multimodal Interaction, Oct 2012, Santa-Monica, CA, United States. pp.371-378, ⟨10.1145/2388676.2388760⟩. ⟨hal-00768761⟩



Record views


Files downloads