Self-Imitation Advantage Learning

Johan Ferret; Olivier Pietquin; Matthieu Geist

Communication Dans Un Congrès Année : 2021

Self-Imitation Advantage Learning

(1, 2) , (1) , (1)

1
2

Johan Ferret

Fonction : Auteur
PersonId : 1092607

Google Brain, Paris

Scool

Olivier Pietquin

Fonction : Auteur
PersonId : 1090627

Google Brain, Paris

Matthieu Geist

Fonction : Auteur
PersonId : 1090629

Google Brain, Paris

Résumé

Self-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems. It was shown to improve the performance of on-policy actor-critic methods in several discrete control tasks. Nevertheless, applying self-imitation to the mostly action-value based off-policy RL methods is not straightforward. We propose SAIL, a novel generalization of self-imitation learning for off-policy RL, based on a modification of the Bellman optimality operator that we connect to Advantage Learning. Crucially, our method mitigates the problem of stale returns by choosing the most optimistic return estimate between the observed return and the current action-value for self-imitation. We demonstrate the empirical effectiveness of SAIL on the Arcade Learning Environment, with a focus on hard exploration games.

Mots clés

Reinforcement Learning Off-Policy Learning Self-Imitation

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

2012.11989.pdf (3.34 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Johan Ferret : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03159815

Soumis le : jeudi 4 mars 2021-17:13:53

Dernière modification le : mercredi 24 janvier 2024-09:54:24

Archivage à long terme le : samedi 5 juin 2021-19:17:38

Dates et versions

hal-03159815 , version 1 (04-03-2021)

Identifiants

HAL Id : hal-03159815 , version 1

Citer

Johan Ferret, Olivier Pietquin, Matthieu Geist. Self-Imitation Advantage Learning. AAMAS 2021 - 20th International Conference on Autonomous Agents and Multiagent Systems, May 2021, Londres / Virtual, United Kingdom. ⟨hal-03159815⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 UNIV-LILLE CRISTAL-SCOOL

42 Consultations

243 Téléchargements

Self-Imitation Advantage Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager