Non-stationary approximate modified policy iteration

Boris Lesner; Bruno Scherrer

Communication Dans Un Congrès Année : 2015

Non-stationary approximate modified policy iteration

(1) , (2, 3)

1
2
3

Boris Lesner

Fonction : Auteur

Autonomous intelligent machine

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Biology, genetics and statistics

Institut Élie Cartan de Lorraine

Résumé

We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error at each iteration is known to lead to stationary policies that are at least 2γ/(1−γ)^2-optimal. Variations of Value and Policy Iteration, that build l-periodic non-stationary policies, have recently been shown to display a better 2γ/((1−γ)(1−γ^l))-optimality guarantee. We describe a new algorithmic scheme, Non-Stationary Modified Policy Iteration, a family of algorithms parameterized by two integers m ≥ 0 and l ≥ 1 that generalizes all the above mentionned algorithms. While m allows one to interpolate between Value-Iteration-style and Policy-Iteration-style updates, l specifies the period of the non-stationary policy that is output. We show that this new family of algorithms also enjoys the improved 2γ/((1−γ)(1−γ))-optimality guarantee. Perhaps more importantly, we show, by exhibiting an original problem instance, that this guarantee is tight for all m and l; this tightness was to our knowledge only known in two specific cases, Value Iteration (m = 0, l = 1) and Policy Iteration (m = ∞, l = 1).

Domaines

Optimisation et contrôle [math.OC] Recherche opérationnelle [math.OC] Apprentissage [cs.LG] Statistiques [math.ST]

Fichier principal

icml2015.pdf (497.24 Ko)

api-vs-nsapi-avg-crop.pdf (29.21 Ko)

api-vs-nsapi-avg-crop2.pdf (20.83 Ko)

fixed_lm_err_std-crop.pdf (119.25 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bruno Scherrer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01186664

Soumis le : mardi 25 août 2015-14:11:01

Dernière modification le : jeudi 2 mai 2024-11:36:27

Archivage à long terme le : mercredi 26 avril 2017-10:27:34

Dates et versions

hal-01186664 , version 1 (25-08-2015)

Identifiants

HAL Id : hal-01186664 , version 1

Citer

Boris Lesner, Bruno Scherrer. Non-stationary approximate modified policy iteration. ICML 2015, Jul 2015, Lille, France. ⟨hal-01186664⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA IECN INSMI UNIV-LORRAINE INRIA2 TDS-MACS LORIA LORIA-AIS UR1-MATH-STIC UR1-UFR-ISTIC IECLPS UNIV-RENNES UR1-MATH-NUM

200 Consultations

340 Téléchargements

Non-stationary approximate modified policy iteration

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager