Mean field for Markov Decision Processes: from Discrete to Continuous Optimization

Nicolas Gast 1 Bruno Gaujal 2 Jean-Yves Le Boudec 1
2 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : We study the convergence of Markov decision processes, composed of a large number of objects, to optimization problems on ordinary differential equations. We show that the optimal reward of such a Markov decision process, which satisfies a Bellman equation, converges to the solution of a continuous Hamilton-Jacobi-Bellman (HJB) equation based on the mean field approximation of the Markov decision process. We give bounds on the difference of the rewards and an algorithm for deriving an approximating solution to the Markov decision process from a solution of the HJB equations. We illustrate the method on three examples pertaining, respectively, to investment strategies, population dynamics control and scheduling in queues. They are used to illustrate and justify the construction of the controlled ODE and to show the advantage of solving a continuous HJB equation rather than a large discrete Bellman equation.
Type de document :
Article dans une revue
IEEE Transactions on Automatic Control, Institute of Electrical and Electronics Engineers, 2012, 57 (9), pp.2266 - 2280. 〈10.1109/TAC.2012.2186176〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00787996
Contributeur : Arnaud Legrand <>
Soumis le : mercredi 13 février 2013 - 14:57:19
Dernière modification le : lundi 2 octobre 2017 - 16:06:04

Identifiants

Collections

Citation

Nicolas Gast, Bruno Gaujal, Jean-Yves Le Boudec. Mean field for Markov Decision Processes: from Discrete to Continuous Optimization. IEEE Transactions on Automatic Control, Institute of Electrical and Electronics Engineers, 2012, 57 (9), pp.2266 - 2280. 〈10.1109/TAC.2012.2186176〉. 〈hal-00787996〉

Partager

Métriques

Consultations de la notice

227