Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs

Eugenio Della Vecchia 1 Silvia C. Di Marco 1 Alain Jean-Marie 2, 3
2 MAESTRO - Models for the performance analysis and the control of networks
CRISAM - Inria Sophia Antipolis - Méditerranée
3 LIRMM/HE - Hors Équipe
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : This paper is concerned with the links between the Value Iteration algorithm and the Rolling Horizon procedure, for solving problems of stochastic optimal control under the long-run average criterion, in Markov Decision Processes with finite state and action spaces. We review conditions of the literature which imply the geometric convergence of Value It- eration to the optimal value. Aperiodicity is an essential prerequisite for convergence. We prove that the convergence of Value Iteration generally implies that of Rolling Horizon. We also present a modified Rolling Horizon procedure that can be applied to models without analyzing periodicity, and discuss the impact of this transformation on convergence. We il- lustrate with numerous examples the different results. Finally, we discuss rules for stopping Value Iteration or finding the length of a Rolling Horizon. We provide an example which demonstrates the difficulty of the question, disproving in particular a conjectured rule pro- posed by Puterman.
Type de document :
Article dans une revue
Annals of Operations Research, Springer Verlag, 2012, 199 (1), pp.193-214. 〈10.1007/s10479-012-1070-0〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00862915
Contributeur : Alain Jean-Marie <>
Soumis le : mardi 17 septembre 2013 - 17:45:49
Dernière modification le : jeudi 24 mai 2018 - 15:59:21

Lien texte intégral

Identifiants

Citation

Eugenio Della Vecchia, Silvia C. Di Marco, Alain Jean-Marie. Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs. Annals of Operations Research, Springer Verlag, 2012, 199 (1), pp.193-214. 〈10.1007/s10479-012-1070-0〉. 〈hal-00862915〉

Partager

Métriques

Consultations de la notice

141