# Analyse en norme $L_p$ de l'algorithme d'itérations sur les valeurs avec approximations

1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : Approximate Value Iteration (AVI) is a method for solving a large Markov Decision Problem by approximating the optimal value function with a sequence of value representations Vn processed by means of the iterations $V_{n+1} = \mathcal{AT}V_n$ where $\mathcal{T}$ is the so-called Bellman operator and $\mathcal{A}$, an approximation operator, which may be implemented by a Supervised Learning (SL) algorithm. Previous results relate the asymptotic performance of AVI to the $L\infty$-norm of the approximation errors induced by the SL algorithm. Unfortunately, the SL algorithm usually perform a minimization problem in $L_p$-norms $(p \geq 1)$, rendering the $L\infty$ performance bounds inadequate. In this paper, we extend these performance bounds to weighted $L_p$-norms. This enables to relate the performance of AVI to the approximation power of the SL algorithm, which guarantees the tightness and pratical interest of these bounds. We illustrate the tightness of the bounds on an optimal replacement problem.
Journal articles

