HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Analyse en norme $L_p$ de l'algorithme d'itérations sur les valeurs avec approximations

Rémi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : Approximate Value Iteration (AVI) is a method for solving a large Markov Decision Problem by approximating the optimal value function with a sequence of value representations Vn processed by means of the iterations $V_{n+1} = \mathcal{AT}V_n$ where $\mathcal{T}$ is the so-called Bellman operator and $\mathcal{A}$, an approximation operator, which may be implemented by a Supervised Learning (SL) algorithm. Previous results relate the asymptotic performance of AVI to the $L\infty$-norm of the approximation errors induced by the SL algorithm. Unfortunately, the SL algorithm usually perform a minimization problem in $L_p$-norms $(p \geq 1)$, rendering the $L\infty$ performance bounds inadequate. In this paper, we extend these performance bounds to weighted $L_p$-norms. This enables to relate the performance of AVI to the approximation power of the SL algorithm, which guarantees the tightness and pratical interest of these bounds. We illustrate the tightness of the bounds on an optimal replacement problem.
Document type :
Journal articles
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/inria-00116987
Contributor : Rémi Munos Connect in order to contact the contributor
Submitted on : Wednesday, November 29, 2006 - 11:59:51 AM
Last modification on : Thursday, January 20, 2022 - 4:16:53 PM
Long-term archiving on: : Thursday, September 20, 2012 - 3:15:18 PM

File

avi_RIA_final.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00116987, version 1

Collections

Citation

Rémi Munos. Analyse en norme $L_p$ de l'algorithme d'itérations sur les valeurs avec approximations. Revue des Sciences et Technologies de l'Information - Série RIA : Revue d'Intelligence Artificielle, Lavoisier, 2007, 21. ⟨inria-00116987⟩

Share

Metrics

Record views

227

Files downloads

317