Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

Boris Lesner; Bruno Scherrer

Pré-Publication, Document De Travail Année : 2013

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

(1) , (1)

Boris Lesner

Fonction : Auteur
PersonId : 933391

Autonomous intelligent machine

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal policy that is stationary, we show that when using value function approximation, looking for a non-stationary policy may lead to a better performance guarantee. We define a non-stationary variant of MPI that unifies a broad family of approximate DP algorithms of the literature. For this algorithm we provide an error propagation analysis in the form of a performance bound of the resulting policies that can improve the usual performance bound by a factor $O\left(1-\gamma\right)$, which is significant when the discount factor $\gamma$ is close to 1. Doing so, our approach unifies recent results for Value and Policy Iteration. Furthermore, we show, by constructing a specific deterministic MDP, that our performance guarantee is tight.

Mots clés

Markov Decision Processes Approximate Dynamic Programming Analysis of Algorithms Performance Bounds

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

report.pdf (570.15 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00815996

Soumis le : vendredi 19 avril 2013-15:54:01

Dernière modification le : lundi 11 septembre 2023-17:41:18

Archivage à long terme le : lundi 3 avril 2017-07:56:17

Dates et versions

hal-00815996 , version 1 (19-04-2013)

Identifiants

HAL Id : hal-00815996 , version 1
ARXIV : 1304.5610

Citer

Boris Lesner, Bruno Scherrer. Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies. 2013. ⟨hal-00815996⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-AIS

179 Consultations

229 Téléchargements

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager