Skip to Main content Skip to Navigation
Journal articles

Analyse en norme $L_p$ de l'algorithme d'itérations sur les valeurs avec approximations

Rémi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : Approximate Value Iteration (AVI) is a method for solving a large Markov Decision Problem by approximating the optimal value function with a sequence of value representations Vn processed by means of the iterations $V_{n+1} = \mathcal{AT}V_n$ where $\mathcal{T}$ is the so-called Bellman operator and $\mathcal{A}$, an approximation operator, which may be implemented by a Supervised Learning (SL) algorithm. Previous results relate the asymptotic performance of AVI to the $L\infty$-norm of the approximation errors induced by the SL algorithm. Unfortunately, the SL algorithm usually perform a minimization problem in $L_p$-norms $(p \geq 1)$, rendering the $L\infty$ performance bounds inadequate. In this paper, we extend these performance bounds to weighted $L_p$-norms. This enables to relate the performance of AVI to the approximation power of the SL algorithm, which guarantees the tightness and pratical interest of these bounds. We illustrate the tightness of the bounds on an optimal replacement problem.
Document type :
Journal articles
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/inria-00116987
Contributor : Rémi Munos <>
Submitted on : Wednesday, November 29, 2006 - 11:59:51 AM
Last modification on : Monday, April 19, 2021 - 3:25:14 PM
Long-term archiving on: : Thursday, September 20, 2012 - 3:15:18 PM

File

avi_RIA_final.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00116987, version 1

Collections

Citation

Rémi Munos. Analyse en norme $L_p$ de l'algorithme d'itérations sur les valeurs avec approximations. Revue des Sciences et Technologies de l'Information - Série RIA : Revue d'Intelligence Artificielle, Lavoisier, 2007, 21. ⟨inria-00116987⟩

Share

Metrics

Record views

573

Files downloads

582