M. Akian and S. Gaubert, Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00881207

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

E. V. Denardo, Nearly strongly polynomial algorithms for transient Markov decision problems. Unpublished Manuscript, 2014.

E. A. Feinberg and J. Huang, Strong polynomiality of policy iterations for average-cost MDPs modeling replacement and maintenance problems, Operations Research Letters, vol.41, issue.3, pp.249-251, 2013.
DOI : 10.1016/j.orl.2013.02.002

E. A. Feinberg and J. Huang, The value iteration algorithm is not strongly polynomial for discounted dynamic programming, Operations Research Letters, vol.42, issue.2, pp.130-131, 2014.
DOI : 10.1016/j.orl.2013.12.011

T. D. Hansen, P. B. Miltersen, and U. Zwick, Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor, Journal of the ACM, vol.60, issue.1, 2013.
DOI : 10.1145/2432622.2432623

R. A. Howard, Dynamic Programming and Markov Processes, 1960.

L. C. Kallenberg, Finite State and Action MDPS, Handbook of Markov Decision Processes, pp.21-87, 2002.
DOI : 10.1007/978-1-4615-0805-2_2

T. Kitahara and S. Mizuno, A bound for the number of different basic solutions generated by the simplex method, Mathematical Programming, vol.137, issue.1-2, pp.579-586, 2013.
DOI : 10.1007/s10107-011-0482-y

I. Post and Y. Ye, The simplex method is strongly polynomial for deterministic Markov decision processes, 2014.

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

M. L. Puterman and M. C. Shin, Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Management Science, vol.24, issue.11, pp.1127-1137, 1978.
DOI : 10.1287/mnsc.24.11.1127

B. Scherrer, Improved and Generalized Upper Bounds on the Complexity of Policy Iteration, Advances in Neural Information Processing Systems 26, pp.386-394, 2013.
DOI : 10.1287/moor.2015.0753

URL : https://hal.archives-ouvertes.fr/hal-00829532

C. Thiery and B. Scherrer, Least-squares policy iteration: Bias-variance trade-off in control problems, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.1071-1078, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00520841

P. Tseng, Solving H-horizon, stationary Markov decision problems in time proportional to log(H), Operations Research Letters, vol.9, issue.5, pp.287-297, 1990.
DOI : 10.1016/0167-6377(90)90022-W

Y. Ye, The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate, Mathematics of Operations Research, vol.36, issue.4, pp.593-603, 2011.
DOI : 10.1287/moor.1110.0516