Open Loop Optimistic Planning

S. Bubeck 1 R. Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We consider the problem of planning in a stochastic and discounted environment with a limited numerical budget. More precisely, we investigate strategies exploring the set of possible sequences of actions, so that, once all available numerical resources (e.g. CPU time, number of calls to a generative model) have been used, one returns a recommendation on the best possible immediate action to follow based on this exploration. The performance of a strategy is assessed in terms of its simple regret, that is the loss in performance resulting from choosing the recommended action instead of an optimal one. We first provide a minimax lower bound for this problem, and show that a uniform planning strategy matches this minimax rate (up to a logarithmic factor). Then we propose a UCB (Upper Confidence Bounds)-based planning algorithm, called OLOP (Open-Loop Optimistic Planning), which is also minimax optimal, and prove that it enjoys much faster rates when there is a small proportion of near-optimal sequences of actions. Finally, we compare our results with the regret bounds one can derive for our setting with bandits algorithms designed for an infinite number of arms.
Complete list of metadatas

https://hal.inria.fr/hal-00943119
Contributor : Philippe Preux <>
Submitted on : Friday, February 7, 2014 - 8:23:47 AM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM

Identifiers

  • HAL Id : hal-00943119, version 1

Collections

Citation

S. Bubeck, R. Munos. Open Loop Optimistic Planning. Conference on Learning Theory, 2010, Haifa, Israel. ⟨hal-00943119⟩

Share

Metrics

Record views

243