Approximate Policy Iteration Schemes: A Comparison

Bruno Scherrer

Communication Dans Un Congrès Année : 2014

Approximate Policy Iteration Schemes: A Comparison

(1)

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy Iteration (CPI), a natural adaptation of the Policy Search by Dynamic Programming algorithm to the infinite-horizon case (PSDP$_\infty$), and the recently proposed Non-Stationary Policy iteration (NSPI(m)). For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API/API($\alpha$), but this comes at the cost of a relative---exponential in $\frac{1}{\epsilon}$---increase of the number of iterations. 2) PSDP$_\infty$ enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a number of iterations similar to that of API. 3) Contrary to API that requires a constant memory, the memory needed by CPI and PSDP$_\infty$ is proportional to their number of iterations, which may be problematic when the discount factor $\gamma$ is close to 1 or the approximation error $\epsilon$ is close to $0$; we show that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance. Simulations with these schemes confirm our analysis.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

icml2014.pdf (583.83 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00989982

Soumis le : lundi 12 mai 2014-17:26:15

Dernière modification le : lundi 11 septembre 2023-17:41:18

Archivage à long terme le : mardi 12 août 2014-12:10:28

Dates et versions

hal-00989982 , version 1 (12-05-2014)

Identifiants

HAL Id : hal-00989982 , version 1
ARXIV : 1405.2878

Citer

Bruno Scherrer. Approximate Policy Iteration Schemes: A Comparison. ICML - 31st International Conference on Machine Learning - 2014, Jun 2014, Pékin, China. ⟨hal-00989982⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-AIS

321 Consultations

438 Téléchargements

Approximate Policy Iteration Schemes: A Comparison

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager