Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00881207
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
Nearly strongly polynomial algorithms for transient Markov decision problems. Unpublished Manuscript, 2014. ,
Strong polynomiality of policy iterations for average-cost MDPs modeling replacement and maintenance problems, Operations Research Letters, vol.41, issue.3, pp.249-251, 2013. ,
DOI : 10.1016/j.orl.2013.02.002
The value iteration algorithm is not strongly polynomial for discounted dynamic programming, Operations Research Letters, vol.42, issue.2, pp.130-131, 2014. ,
DOI : 10.1016/j.orl.2013.12.011
Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor, Journal of the ACM, vol.60, issue.1, 2013. ,
DOI : 10.1145/2432622.2432623
Dynamic Programming and Markov Processes, 1960. ,
Finite State and Action MDPS, Handbook of Markov Decision Processes, pp.21-87, 2002. ,
DOI : 10.1007/978-1-4615-0805-2_2
A bound for the number of different basic solutions generated by the simplex method, Mathematical Programming, vol.137, issue.1-2, pp.579-586, 2013. ,
DOI : 10.1007/s10107-011-0482-y
The simplex method is strongly polynomial for deterministic Markov decision processes, 2014. ,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994. ,
DOI : 10.1002/9780470316887
Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Management Science, vol.24, issue.11, pp.1127-1137, 1978. ,
DOI : 10.1287/mnsc.24.11.1127
Improved and Generalized Upper Bounds on the Complexity of Policy Iteration, Advances in Neural Information Processing Systems 26, pp.386-394, 2013. ,
DOI : 10.1287/moor.2015.0753
URL : https://hal.archives-ouvertes.fr/hal-00829532
Least-squares policy iteration: Bias-variance trade-off in control problems, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.1071-1078, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00520841
Solving H-horizon, stationary Markov decision problems in time proportional to log(H), Operations Research Letters, vol.9, issue.5, pp.287-297, 1990. ,
DOI : 10.1016/0167-6377(90)90022-W
The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate, Mathematics of Operations Research, vol.36, issue.4, pp.593-603, 2011. ,
DOI : 10.1287/moor.1110.0516