Skip to Main content Skip to Navigation
Conference papers

Audio-Visual Robot Command Recognition

Jordi Sanchez-Riera 1 Xavier Alameda-Pineda 1, * Radu Horaud 1, *
* Corresponding author
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP [2007-2019] - Institut polytechnique de Grenoble - Grenoble Institute of Technology [2007-2019]
Abstract : This paper addresses the problem of audio-visual command recognition in the framework of the D-META Grand Challenge. Temporal and non-temporal learning models are trained on visual and auditory descriptors. In order to set a proper baseline, the methods are tested on the ''Robot Gestures'' scenario of the publicly available RAVEL data set, following the leave-one-out cross-validation strategy. The classification-level audio-visual fusion strategy allows for compensating the errors of the unimodal (audio or vision) classifiers. The obtained results (an average audio-visual recognition rate of almost 80%) encourage us to investigate on how to further develop and improve the methodology described in this paper.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.inria.fr/hal-00768761
Contributor : Team Perception <>
Submitted on : Sunday, December 23, 2012 - 7:16:30 PM
Last modification on : Thursday, July 9, 2020 - 9:44:36 AM
Document(s) archivé(s) le : Sunday, March 24, 2013 - 3:50:57 AM

File

gcp03-pineda.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Jordi Sanchez-Riera, Xavier Alameda-Pineda, Radu Horaud. Audio-Visual Robot Command Recognition. ICMI 2012 - 14th ACM International Conference on Multimodal Interaction, Oct 2012, Santa-Monica, CA, United States. pp.371-378, ⟨10.1145/2388676.2388760⟩. ⟨hal-00768761⟩

Share

Metrics

Record views

520

Files downloads

677