Performance Bounds in $L_p$ norm for Approximate Value Iteration

Rémi Munos

doi:10.1137/040614384

Article Dans Une Revue SIAM Journal on Control and Optimization Année : 2007

Performance Bounds in $L_p$ norm for Approximate Value Iteration

(1)

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Résumé

Approximate Value Iteration (AVI) is a method for solving large Markov Decision Problems by approximating the optimal value function with a sequence of value function representations $V_n$ processed according to the iterations $V{n+1} = \mathcal{ATV}_n$ where $\mathcal{T}$ is the so-called Bellman operator and $\mathcal{A}$ an approximation operator, which may be implemented by a Supervised Learning (SL) algorithm. Usual bounds on the asymptotic performance of AVI are established in terms of the $L\infty$-norm approximation errors induced by the SL algorithm. However, most widely used SL algorithms (such as least squares regression) return a function (the best fit) that minimizes an empirical approximation error in $L_p$-norm $(p\geq1)$. In this paper, we extend the performance bounds of AVI to weighted $L_p$-norms, which enables to directly relate the performance of AVI to the approximation power of the SL algorithm, hence assuring the tightness and pratical relevance of these bounds. The main result is a performance bound of the resulting policies expressed in terms of the $L_p$-norm errors introduced by the successive approximations. The new bound takes into account a concentration coefficient that estimates how much the discounted future-state distributions starting from a probability measure used to assess the performance of AVI can possibly differ from the distribution used in the regression operation. We illustrate the tightness of the bounds on an optimal replacement problem.

Mots clés

Reinforcement learning Markov decision process Value iteration Function approximation Dynamic programming Statistical learning

Domaines

Apprentissage [cs.LG]

Fichier principal

avi_siam_final.pdf (328.8 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Munos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00124685

Soumis le : lundi 15 janvier 2007-17:46:28

Dernière modification le : vendredi 24 mars 2023-14:52:48

Archivage à long terme le : mercredi 7 avril 2010-02:12:09

Dates et versions

inria-00124685 , version 1 (15-01-2007)

Identifiants

HAL Id : inria-00124685 , version 1
DOI : 10.1137/040614384

Citer

Rémi Munos. Performance Bounds in $L_p$ norm for Approximate Value Iteration. SIAM Journal on Control and Optimization, 2007, 46 (2), pp.541-561. ⟨10.1137/040614384⟩. ⟨inria-00124685⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS INRIA2

1178 Consultations

1742 Téléchargements

Performance Bounds in $L_p$ norm for Approximate Value Iteration

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager