HIGhER: Improving instruction following with Hindsight Generation for Experience Replay

Geoffrey Cideron; Mathieu Seurin; Florian Strub; Olivier Pietquin

Communication Dans Un Congrès Année : 2020

HIGhER: Improving instruction following with Hindsight Generation for Experience Replay

(1) , (2) , (3) , (4)

1
2
3
4

Geoffrey Cideron

Fonction : Auteur

Ecole Normale Supérieure Paris-Saclay

Mathieu Seurin

Fonction : Auteur
PersonId : 1039295

Scool

Florian Strub

Fonction : Auteur
PersonId : 18649
IdHAL : florian-strub
ORCID : 0000-0001-7271-5345

DeepMind [Paris]

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

Google Research [Paris]

Résumé

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. While these characterizations may foster instructing, conditioning or structuring interactive agent behavior, it remains an open-problem to correctly relate language understanding and reinforcement learning in even simple instruction following scenarios. This joint learning problem is alleviated through expert demonstrations, auxiliary losses, or neural inductive biases. In this paper, we propose an orthogonal approach called Hindsight Generation for Experience Replay (HIGhER) that extends the Hindsight Experience Replay approach to the language-conditioned policy setting. Whenever the agent does not fulfill its instruction, HIGhER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, HIGhER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We show the efficiency of our approach in the BabyAI environment, and demonstrate how it complements other instruction following methods.

Mots clés

Natural Language Processing Reinforcement Learning Representation Learning

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

HIGhER___ADPRL.pdf (1.02 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Mathieu Seurin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03123981

Soumis le : jeudi 28 janvier 2021-12:10:19

Dernière modification le : mercredi 24 janvier 2024-09:54:24

Archivage à long terme le : jeudi 29 avril 2021-18:43:20

Dates et versions

hal-03123981 , version 1 (28-01-2021)

Identifiants

HAL Id : hal-03123981 , version 1

Citer

Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin. HIGhER: Improving instruction following with Hindsight Generation for Experience Replay. ADPRL 2020 - IEEE SSCI Conference on Adaptive Dynamic Programming and Reinforcement Learning, Dec 2020, Camberra / Virtual, Australia. ⟨hal-03123981⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA ENS-CACHAN GRID5000 CRISTAL INRIA2 UNIV-LILLE SILECS CRISTAL-SCOOL ENS-PARIS-SACLAY

96 Consultations

200 Téléchargements

HIGhER: Improving instruction following with Hindsight Generation for Experience Replay

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager