Skip to Main content Skip to Navigation
Conference papers

Scale-free adaptive planning for deterministic dynamics & discounted rewards

Peter Bartlett 1 Victor Gabillon 2 Jennifer Healey 3 Michal Valko 4
4 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.
Document type :
Conference papers
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download
Contributor : Michal Valko Connect in order to contact the contributor
Submitted on : Friday, November 29, 2019 - 6:15:55 PM
Last modification on : Tuesday, January 4, 2022 - 6:14:31 AM


Files produced by the author(s)


  • HAL Id : hal-02387484, version 1


Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko. Scale-free adaptive planning for deterministic dynamics & discounted rewards. International Conference on Machine Learning, 2019, Long Beach, United States. ⟨hal-02387484⟩



Les métriques sont temporairement indisponibles