Scale-free adaptive planning for deterministic dynamics & discounted rewards - Archive ouverte HAL Access content directly
Conference Papers Year :

Scale-free adaptive planning for deterministic dynamics & discounted rewards

(1) , (2) , (3) , (4)
1
2
3
4
Victor Gabillon
  • Function : Author
  • PersonId : 925091
Jennifer Healey
  • Function : Author
Michal Valko

Abstract

We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.
Fichier principal
Vignette du fichier
icml2019platypoos.pdf (621.25 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02387484 , version 1 (29-11-2019)

Identifiers

  • HAL Id : hal-02387484 , version 1

Cite

Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko. Scale-free adaptive planning for deterministic dynamics & discounted rewards. International Conference on Machine Learning, 2019, Long Beach, United States. ⟨hal-02387484⟩
76 View
86 Download

Share

Gmail Facebook Twitter LinkedIn More