Deep Reinforcement Learning for Audio-Visual Gaze Control

Stéphane Lathuilière; Benoit Massé; Pablo Mesejo; Radu Horaud

doi:10.1109/IROS.2018.8594327

Communication Dans Un Congrès Année : 2018

Deep Reinforcement Learning for Audio-Visual Gaze Control

(1) , (1) , (1) , (1)

Stéphane Lathuilière

Fonction : Auteur
PersonId : 1058528
IdHAL : stephane-lathuiliere

Interpretation and Modelling of Images and Videos

Benoit Massé

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Pablo Mesejo

Fonction : Auteur
PersonId : 5648
IdHAL : pmesejo
ORCID : 0000-0001-9955-2101
IdRef : 253125146

Interpretation and Modelling of Images and Videos

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

We address the problem of audiovisual gaze control in the specific context of human-robot interaction, namely how controlled robot motions are combined with visual and acoustic observations in order to direct the robot head towards targets of interest. The paper has the following contributions: (i) a novel audiovisual fusion framework that is well suited for controlling the gaze of a robotic head; (ii) a reinforcement learning (RL) formulation for the gaze control problem, using a reward function based on the available temporal sequence of camera and microphone observations; and (iii) several deep architectures that allow to experiment with early and late fusion of audio and visual data. We introduce a simulated environment that enables us to learn the proposed deep RL model without the need of spending hours of tedious interaction. By thoroughly experimenting on a publicly available dataset and on a real robot, we provide empirical evidence that our method achieves state-of-the-art performance.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Apprentissage [cs.LG]

Fichier principal

main.pdf (3.97 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01851738

Soumis le : lundi 30 juillet 2018-17:44:41

Dernière modification le : jeudi 4 avril 2024-21:08:54

Archivage à long terme le : mercredi 31 octobre 2018-14:32:09

Dates et versions

hal-01851738 , version 1 (30-07-2018)

Identifiants

HAL Id : hal-01851738 , version 1
DOI : 10.1109/IROS.2018.8594327

Citer

Stéphane Lathuilière, Benoit Massé, Pablo Mesejo, Radu Horaud. Deep Reinforcement Learning for Audio-Visual Gaze Control. IROS 2018 - IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct 2018, Madrid, Spain. pp.1555-1562, ⟨10.1109/IROS.2018.8594327⟩. ⟨hal-01851738⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_GI_PERCEPTION INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

325 Consultations

502 Téléchargements

Deep Reinforcement Learning for Audio-Visual Gaze Control

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager