Neural Network Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and adapt its gaze control strategy for human-robot interaction without the use of external sensors or human supervision. The robot learns to focus its attention on groups of people from its own audio-visual experiences, and independently of the number of people in the environment, their position and physical appearance. In particular, we use recurrent neural networks and Q-learning to find an optimal action-selection policy, and we pretrain on a synthetic environment that simulates sound sources and moving participants to avoid the need of interacting with people for hours. Our experimental evaluation suggests that the proposed method is robust in terms of parameters configuration (i.e. the selection of the parameter values has not a decisive impact on the performance). The best results are obtained when audio and video information are jointly used, and when a late fusion strategy is employed (i.e. when both sources of information are separately processed and then fused). Successful experiments on a real environment with the Nao robot indicate that our framework is a step forward towards the autonomous learning of a perceivable and socially acceptable gaze behavior.

Mots clés

Reinforcement Learning Human-Robot Interaction Neural Networks Robot Gaze Control Transfer Learning Multimodal Data Fusion

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

Lathuiliere-arxiv-v1.pdf (3.28 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01643775

Soumis le : mardi 21 novembre 2017-16:15:47

Dernière modification le : mercredi 3 avril 2024-12:50:03

Dates et versions

hal-01643775 , version 1 (21-11-2017)

hal-01643775 , version 2 (25-04-2018)

Identifiants

HAL Id : hal-01643775 , version 1
ARXIV : 1711.06834

Citer

Stéphane Lathuilière, Benoît Massé, Pablo Mesejo, Radu Horaud. Neural Network Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction. 2017. ⟨hal-01643775v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

682 Consultations

391 Téléchargements