Performance Bounds in Lp norm for Approximate Value Iteration

Rémi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : Approximate Value Iteration (AVI) is a method for solving large Markov Decision Problems by approximating the optimal value function with a sequence of value function representations V(n) processed according to the iterations V(n+1) = A T V(n) where T is the so-called Bellman operator and A an approximation operator, which may be implemented by a Supervised Learning (SL) algorithm. Usual bounds on the asymptotic performance of AVI are established in terms of the sup-norm approximation errors induced by the SL algorithm. However, most widely used SL algorithms (such as least squares regression) return a function (the best fit) that minimizes an empirical approximation error in Lp-norm (p>=1). In this paper, we extend the performance bounds of AVI to weighted Lp-norms, which enables to directly relate the performance of AVI to the approximation power of the SL algorithm, hence assuring the tightness and pratical relevance of these bounds. The main result is a performance bound of the resulting policies expressed in terms of the Lp-norm errors introduced by the successive approximations. The new bound takes into account a concentration coefficient that estimates how much the discounted future-state distributions starting from a probability measure used to assess the performance of AVI can possibly differ from the distribution used in the regression operation. We illustrate the tightness of the bounds on an optimal replacement problem.
Type de document :
Article dans une revue
SIAM Journal on Control and Optimization, Society for Industrial and Applied Mathematics, 2007
Liste complète des métadonnées
Contributeur : Rémi Munos <>
Soumis le : lundi 15 janvier 2007 - 17:46:28
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : mercredi 7 avril 2010 - 02:12:09


Fichiers produits par l'(les) auteur(s)


  • HAL Id : inria-00124685, version 1



Rémi Munos. Performance Bounds in Lp norm for Approximate Value Iteration. SIAM Journal on Control and Optimization, Society for Industrial and Applied Mathematics, 2007. 〈inria-00124685〉



Consultations de la notice


Téléchargements de fichiers