Non-Stationary Approximate Modified Policy Iteration

Boris Lesner 1 Bruno Scherrer 2, 3
1 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
2 BIGS - Biology, genetics and statistics
Inria Nancy - Grand Est, IECL - Institut Élie Cartan de Lorraine
3 Probabilités et statistiques
IECL - Institut Élie Cartan de Lorraine
Abstract : We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error at each iteration is known to lead to stationary policies that are at least 2γ/(1−γ)^2-optimal. Variations of Value and Policy Iteration, that build l-periodic non-stationary policies, have recently been shown to display a better 2γ/((1−γ)(1−γ^l))-optimality guarantee. We describe a new algorithmic scheme, Non-Stationary Modified Policy Iteration, a family of algorithms parameterized by two integers m ≥ 0 and l ≥ 1 that generalizes all the above mentionned algorithms. While m allows one to interpolate between Value-Iteration-style and Policy-Iteration-style updates, l specifies the period of the non-stationary policy that is output. We show that this new family of algorithms also enjoys the improved 2γ/((1−γ)(1−γ))-optimality guarantee. Perhaps more importantly, we show, by exhibiting an original problem instance, that this guarantee is tight for all m and l; this tightness was to our knowledge only known in two specific cases, Value Iteration (m = 0, l = 1) and Policy Iteration (m = ∞, l = 1).
Liste complète des métadonnées

https://hal.inria.fr/hal-01186664
Contributeur : Bruno Scherrer <>
Soumis le : mardi 25 août 2015 - 14:11:01
Dernière modification le : jeudi 11 janvier 2018 - 06:26:22
Document(s) archivé(s) le : mercredi 26 avril 2017 - 10:27:34

Fichiers

icml2015.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01186664, version 1

Citation

Boris Lesner, Bruno Scherrer. Non-Stationary Approximate Modified Policy Iteration. ICML 2015, Jul 2015, Lille, France. 2015. 〈hal-01186664〉

Partager

Métriques

Consultations de la notice

305

Téléchargements de fichiers

235