Skip to Main content Skip to Navigation
Conference papers

Scale-free adaptive planning for deterministic dynamics & discounted rewards

Peter Bartlett 1 Victor Gabillon 2 Jennifer Healey 3 Michal Valko 4
4 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.
Document type :
Conference papers
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download
Contributor : Michal Valko <>
Submitted on : Friday, November 29, 2019 - 6:15:55 PM
Last modification on : Friday, December 11, 2020 - 6:44:05 PM


Files produced by the author(s)


  • HAL Id : hal-02387484, version 1


Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko. Scale-free adaptive planning for deterministic dynamics & discounted rewards. International Conference on Machine Learning, 2019, Long Beach, United States. ⟨hal-02387484⟩



Record views


Files downloads