M. Akian and S. Gaubert, Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00881207

D. Bertsekas and J. Tsitsiklis, Neurodynamic Programming, 1996.

J. Fearnley, Exponential Lower Bounds for Policy Iteration, Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II , ICALP'10, pp.551-562, 2010.
DOI : 10.1007/978-3-642-14162-1_46

URL : http://arxiv.org/abs/1003.3418

F. Fritz, B. Huppert, and W. Willems, Stochastische Matrizen, 1979.
DOI : 10.1007/978-3-642-67131-9

T. Hansen, Worst-case Analysis of Strategy Iteration and the Simplex Method, 2012.

T. Hansen and U. Zwick, Lower Bounds for Howard???s Algorithm for Finding Minimum Mean-Cost Cycles, pp.415-426, 2010.
DOI : 10.1007/978-3-642-17517-6_37

T. Hansen, P. Miltersen, and U. Zwick, Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor, Journal of the ACM, vol.60, issue.1, pp.1-16, 2013.
DOI : 10.1145/2432622.2432623

R. Hollanders, J. Delvenne, J. , and R. , The complexity of Policy Iteration is exponential for discounted Markov Decision Processes, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012.
DOI : 10.1109/CDC.2012.6426485

R. Hollanders, B. Gerencsér, J. Delvenne, J. , and R. , Improved bound on the worst case complexity of Policy Iteration, Operations Research Letters, vol.44, issue.2, 2014.
DOI : 10.1016/j.orl.2016.01.010

Y. Mansour and S. Singh, On the complexity of policy iteration, UAI, pp.401-408, 1999.

M. Melekopoglou and A. Condon, On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes, ORSA Journal on Computing, vol.6, issue.2, pp.188-192, 1994.
DOI : 10.1287/ijoc.6.2.188

I. Post and Y. Ye, The simplex method is strongly polynomial for deterministic Markov decision processes, 24th ACM-SIAM Symposium on Discrete Algorithms, 2013.

M. Puterman, Markov Decision Processes, 1994.
DOI : 10.1002/9780470316887

N. Schmitz, How good is Howard's policy improvement algorithm? Zeitschrift für, Operations Research, vol.29, issue.7, pp.315-316, 1985.

D. Stroock, An introduction to Markov processes, 2005.
DOI : 10.1007/978-3-642-40523-5

Y. Ye, The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate, Mathematics of Operations Research, vol.36, issue.4, pp.593-603, 2011.
DOI : 10.1287/moor.1110.0516