Audio-Visual Robot Command Recognition

Jordi Sanchez-Riera 1 Xavier Alameda-Pineda 1, * Radu Horaud 1, *
* Corresponding author
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : This paper addresses the problem of audio-visual command recognition in the framework of the D-META Grand Challenge. Temporal and non-temporal learning models are trained on visual and auditory descriptors. In order to set a proper baseline, the methods are tested on the ''Robot Gestures'' scenario of the publicly available RAVEL data set, following the leave-one-out cross-validation strategy. The classification-level audio-visual fusion strategy allows for compensating the errors of the unimodal (audio or vision) classifiers. The obtained results (an average audio-visual recognition rate of almost 80%) encourage us to investigate on how to further develop and improve the methodology described in this paper.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.inria.fr/hal-00768761
Contributor : Team Perception <>
Submitted on : Sunday, December 23, 2012 - 7:16:30 PM
Last modification on : Wednesday, April 11, 2018 - 1:59:15 AM
Long-term archiving on : Sunday, March 24, 2013 - 3:50:57 AM

File

gcp03-pineda.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Jordi Sanchez-Riera, Xavier Alameda-Pineda, Radu Horaud. Audio-Visual Robot Command Recognition. ICMI 2012 - 14th ACM International Conference on Multimodal Interaction, Oct 2012, Santa-Monica, CA, United States. pp.371-378, ⟨10.1145/2388676.2388760⟩. ⟨hal-00768761⟩

Share

Metrics

Record views

387

Files downloads

269