What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Marcin Andrychowicz; Anton Raichuk; Piotr Stańczyk; Manu Orsini; Sertan Girgin; Raphaël Marinier; Léonard Hussenot; Matthieu Geist; Olivier Pietquin; Marcin Michalski; Sylvain Gelly; Olivier Bachem

Communication Dans Un Congrès Année : 2021

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

(1) , (1) , (1) , (1) , (1) , (1) , (1, 2) , (1) , (1) , (1) , (1) , (1)

1
2

Marcin Andrychowicz

Fonction : Auteur

Google Research [Paris]

Anton Raichuk

Fonction : Auteur

Google Research [Paris]

Piotr Stańczyk

Fonction : Auteur

Google Research [Paris]

Manu Orsini

Fonction : Auteur

Google Research [Paris]

Sertan Girgin

Fonction : Auteur

Google Research [Paris]

Raphaël Marinier

Fonction : Auteur

Google Research [Paris]

Léonard Hussenot

Fonction : Auteur
PersonId : 1092830

Google Research [Paris]

Scool

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

Google Research [Paris]

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

Google Research [Paris]

Marcin Michalski

Fonction : Auteur

Google Research [Paris]

Sylvain Gelly

Fonction : Auteur

Google Research [Paris]

Olivier Bachem

Fonction : Auteur

Google Research [Paris]

Résumé

In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.

Domaines

Informatique [cs]

Fichier principal

2006.05990.pdf (3.26 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Léonard Hussenot : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03162554

Soumis le : lundi 8 mars 2021-15:43:53

Dernière modification le : mercredi 24 janvier 2024-09:54:24

Archivage à long terme le : mercredi 9 juin 2021-19:15:06

Dates et versions

hal-03162554 , version 1 (08-03-2021)

Identifiants

HAL Id : hal-03162554 , version 1
ARXIV : 2006.05990

Citer

Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, et al.. What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study. ICLR 2021 - Ninth International Conference on Learning Representations, May 2021, Vienna / Virtual, Austria. ⟨hal-03162554⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 UNIV-LILLE CRISTAL-SCOOL

107 Consultations

437 Téléchargements

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager