Scale-free adaptive planning for deterministic dynamics & discounted rewards

Peter Bartlett; Victor Gabillon; Jennifer Healey; Michal Valko

Communication Dans Un Congrès Année : 2019

Scale-free adaptive planning for deterministic dynamics & discounted rewards

(1) , (2) , (3) , (4)

1
2
3
4

Peter Bartlett

Fonction : Auteur

Queensland University of Technology [Brisbane]

Victor Gabillon

Fonction : Auteur
PersonId : 925091

Huawei Noah's Ark Lab [China]

Jennifer Healey

Fonction : Auteur

Adobe Research

Michal Valko

Fonction : Auteur
PersonId : 284
IdHAL : michal
IdRef : 22360934X

Sequential Learning

Résumé

We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.

Domaines

Machine Learning [stat.ML]

Fichier principal

icml2019platypoos.pdf (621.25 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Michal Valko : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02387484

Soumis le : vendredi 29 novembre 2019-18:15:55

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-02387484 , version 1 (29-11-2019)

Identifiants

HAL Id : hal-02387484 , version 1

Citer

Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko. Scale-free adaptive planning for deterministic dynamics & discounted rewards. International Conference on Machine Learning, 2019, Long Beach, United States. ⟨hal-02387484⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

76 Consultations

90 Téléchargements

Scale-free adaptive planning for deterministic dynamics & discounted rewards

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager