Improved sample complexity for incremental autonomous exploration in MDPs - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Improved sample complexity for incremental autonomous exploration in MDPs

Jean Tarbouriech
  • Fonction : Auteur
  • PersonId : 1105513
Matteo Pirotta
  • Fonction : Auteur
  • PersonId : 1105514
Michal Valko
Alessandro Lazaric
  • Fonction : Auteur
  • PersonId : 1105515

Résumé

We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of ε-optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s 0. In this paper, we introduce a novel modelbased approach that interleaves discovering new states from s 0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as O(L 5 S L+ε Γ L+ε A ε −2), where A is the number of actions, S L+ε is the number of states that are incrementally reachable from s 0 in L + ε steps, and Γ L+ε is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both ε and L at the cost of an extra Γ L+ε factor, which is small in most environments of interest. Furthermore, DISCO is the first algorithm that can return an ε/c min-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost c min. Finally, we report preliminary empirical results confirming our theoretical findings.
Fichier principal
Vignette du fichier
tarbouriech2020improved.pdf (1.28 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03287829 , version 1 (15-07-2021)

Identifiants

  • HAL Id : hal-03287829 , version 1

Citer

Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric. Improved sample complexity for incremental autonomous exploration in MDPs. Neural Information Processing Systems, 2020, Montréal, Canada. ⟨hal-03287829⟩
19 Consultations
34 Téléchargements

Partager

Gmail Facebook X LinkedIn More