Scale-free adaptive planning for deterministic dynamics & discounted rewards - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Scale-free adaptive planning for deterministic dynamics & discounted rewards

Victor Gabillon
  • Fonction : Auteur
  • PersonId : 925091
Jennifer Healey
  • Fonction : Auteur
Michal Valko

Résumé

We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.
Fichier principal
Vignette du fichier
icml2019platypoos.pdf (621.25 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02387484 , version 1 (29-11-2019)

Identifiants

  • HAL Id : hal-02387484 , version 1

Citer

Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko. Scale-free adaptive planning for deterministic dynamics & discounted rewards. International Conference on Machine Learning, 2019, Long Beach, United States. ⟨hal-02387484⟩
76 Consultations
90 Téléchargements

Partager

Gmail Facebook X LinkedIn More