Open Loop Optimistic Planning - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Open Loop Optimistic Planning

Résumé

We consider the problem of planning in a stochastic and discounted environment with a limited numerical budget. More precisely, we investigate strategies exploring the set of possible sequences of actions, so that, once all available numerical resources (e.g. CPU time, number of calls to a generative model) have been used, one returns a recommendation on the best possible immediate action to follow based on this exploration. The performance of a strategy is assessed in terms of its simple regret, that is the loss in performance resulting from choosing the recommended action instead of an optimal one. We first provide a minimax lower bound for this problem, and show that a uniform planning strategy matches this minimax rate (up to a logarithmic factor). Then we propose a UCB (Upper Confidence Bounds)-based planning algorithm, called OLOP (Open-Loop Optimistic Planning), which is also minimax optimal, and prove that it enjoys much faster rates when there is a small proportion of near-optimal sequences of actions. Finally, we compare our results with the regret bounds one can derive for our setting with bandits algorithms designed for an infinite number of arms.
Fichier principal
Vignette du fichier
COLT10_BM.pdf (218.34 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00943119 , version 1 (07-06-2021)

Identifiants

  • HAL Id : hal-00943119 , version 1

Citer

Sébastien Bubeck, Rémi Munos. Open Loop Optimistic Planning. COLT 2010 - The 23rd Conference on Learning Theory, Jun 2010, Haifa, Israel. ⟨hal-00943119⟩
195 Consultations
72 Téléchargements

Partager

Gmail Facebook X LinkedIn More