Skip to Main content Skip to Navigation
Conference papers

Improved sample complexity for incremental autonomous exploration in MDPs

Jean Tarbouriech 1, 2 Matteo Pirotta 1 Michal Valko 3 Alessandro Lazaric 1 
2 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of ε-optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s 0. In this paper, we introduce a novel modelbased approach that interleaves discovering new states from s 0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as O(L 5 S L+ε Γ L+ε A ε −2), where A is the number of actions, S L+ε is the number of states that are incrementally reachable from s 0 in L + ε steps, and Γ L+ε is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both ε and L at the cost of an extra Γ L+ε factor, which is small in most environments of interest. Furthermore, DISCO is the first algorithm that can return an ε/c min-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost c min. Finally, we report preliminary empirical results confirming our theoretical findings.
Document type :
Conference papers
Complete list of metadata
Contributor : Michal Valko Connect in order to contact the contributor
Submitted on : Thursday, July 15, 2021 - 9:47:52 PM
Last modification on : Sunday, June 26, 2022 - 9:10:13 AM
Long-term archiving on: : Saturday, October 16, 2021 - 7:17:14 PM


Files produced by the author(s)


  • HAL Id : hal-03287829, version 1



Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric. Improved sample complexity for incremental autonomous exploration in MDPs. Neural Information Processing Systems, 2020, Montréal, Canada. ⟨hal-03287829⟩



Record views


Files downloads