Skip to Main content Skip to Navigation
Conference papers

Reward-free exploration beyond finite-horizon

Jean Tarbouriech 1, 2 Matteo Pirotta 1 Michal Valko 3 Alessandro Lazaric 1 
2 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : We consider the reward-free exploration framework introduced by Jin et al. (2020), where an RL agent interacts with an unknown environment without any explicit reward function to maximize. The objective is to collect enough information during the exploration phase, so that a near-optimal policy can be immediately computed once any reward function is provided. In this paper, we move from the finite-horizon setting studied by Jin et al. (2020) to the more general setting of goalconditioned RL, often referred to as stochastic shortest path (SSP). We first discuss the challenges specific to SSPs and then study two scenarios: 1) reward-free goal-free exploration in communicating MDPs, and 2) reward-free goal-free incremental exploration in non-communicating MDPs where the agent is provided with a reset action to an initial state. In both cases, we provide exploration algorithms and their samplecomplexity bounds which we contrast with the existing guarantees in the finite-horizon case. 1
Document type :
Conference papers
Complete list of metadata
Contributor : Michal Valko Connect in order to contact the contributor
Submitted on : Friday, July 16, 2021 - 3:48:34 PM
Last modification on : Sunday, June 26, 2022 - 9:10:13 AM
Long-term archiving on: : Sunday, October 17, 2021 - 6:51:35 PM


Files produced by the author(s)


  • HAL Id : hal-03288970, version 1



Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric. Reward-free exploration beyond finite-horizon. ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning, 2020, Vienna, France. ⟨hal-03288970⟩



Record views


Files downloads