Reward-free exploration beyond finite-horizon

Jean Tarbouriech; Matteo Pirotta; Michal Valko; Alessandro Lazaric

Communication Dans Un Congrès Année : 2020

Reward-free exploration beyond finite-horizon

(1, 2) , (1) , (3) , (1)

1
2
3

Jean Tarbouriech

Fonction : Auteur

Facebook AI Research [Paris]

Scool

Matteo Pirotta

Fonction : Auteur

Facebook AI Research [Paris]

Michal Valko

Fonction : Auteur
PersonId : 284
IdHAL : michal
IdRef : 22360934X

DeepMind [Paris]

Alessandro Lazaric

Fonction : Auteur

Facebook AI Research [Paris]

Résumé

We consider the reward-free exploration framework introduced by Jin et al. (2020), where an RL agent interacts with an unknown environment without any explicit reward function to maximize. The objective is to collect enough information during the exploration phase, so that a near-optimal policy can be immediately computed once any reward function is provided. In this paper, we move from the finite-horizon setting studied by Jin et al. (2020) to the more general setting of goalconditioned RL, often referred to as stochastic shortest path (SSP). We first discuss the challenges specific to SSPs and then study two scenarios: 1) reward-free goal-free exploration in communicating MDPs, and 2) reward-free goal-free incremental exploration in non-communicating MDPs where the agent is provided with a reset action to an initial state. In both cases, we provide exploration algorithms and their samplecomplexity bounds which we contrast with the existing guarantees in the finite-horizon case. 1

Domaines

Machine Learning [stat.ML]

Fichier principal

tarbouriech2020reward-free.pdf (304.89 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Michal Valko : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03288970

Soumis le : vendredi 16 juillet 2021-15:48:34

Dernière modification le : mercredi 24 janvier 2024-09:54:24

Archivage à long terme le : dimanche 17 octobre 2021-18:51:35

Dates et versions

hal-03288970 , version 1 (16-07-2021)

Identifiants

HAL Id : hal-03288970 , version 1

Citer

Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric. Reward-free exploration beyond finite-horizon. ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning, 2020, Vienna, France. ⟨hal-03288970⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 UNIV-LILLE CRISTAL-SCOOL

62 Consultations

73 Téléchargements

Reward-free exploration beyond finite-horizon

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager