Open Loop Optimistic Planning

S. Bubeck 1 R. Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider the problem of planning in a stochastic and discounted environment with a limited numerical budget. More precisely, we investigate strategies exploring the set of possible sequences of actions, so that, once all available numerical resources (e.g. CPU time, number of calls to a generative model) have been used, one returns a recommendation on the best possible immediate action to follow based on this exploration. The performance of a strategy is assessed in terms of its simple regret, that is the loss in performance resulting from choosing the recommended action instead of an optimal one. We first provide a minimax lower bound for this problem, and show that a uniform planning strategy matches this minimax rate (up to a logarithmic factor). Then we propose a UCB (Upper Confidence Bounds)-based planning algorithm, called OLOP (Open-Loop Optimistic Planning), which is also minimax optimal, and prove that it enjoys much faster rates when there is a small proportion of near-optimal sequences of actions. Finally, we compare our results with the regret bounds one can derive for our setting with bandits algorithms designed for an infinite number of arms.
Type de document :
Communication dans un congrès
Conference on Learning Theory, 2010, Haifa, Israel. 2010
Liste complète des métadonnées
Contributeur : Philippe Preux <>
Soumis le : vendredi 7 février 2014 - 08:23:47
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13


  • HAL Id : hal-00943119, version 1



S. Bubeck, R. Munos. Open Loop Optimistic Planning. Conference on Learning Theory, 2010, Haifa, Israel. 2010. 〈hal-00943119〉



Consultations de la notice