D. Bernstein, S. Zilberstein, R. Washington, and J. Bresina, Planetary rover control as a markov decision process, AAAI Spring Symposium: Game Theoretic and Decision Theoretic Agents, 2001.

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for reinforcement learning, ICML, pp.162-169, 2003.

W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963.
DOI : 10.1214/aoms/1177730491

R. A. Howard, Dynamic programming and Markov processes, 1960.

M. Kearns and D. Koller, Efficient reinforcement learning in factored MDPs, Seventeenth International Joint Conference on Artificial Intelligence, 1999.

M. Kearns and S. Singh, Near-optimal reinforcement learning in polynomial time, Proc. 15th International Conf. on Machine Learning, pp.260-268, 1998.

P. R. Kumar, A Survey of Some Results in Stochastic Adaptive Control, SIAM Journal on Control and Optimization, vol.23, issue.3, pp.329-367, 1985.
DOI : 10.1137/0323023

P. R. Kumar and P. P. Varaiya, Stochastic Systems: Estimation, Identification , and Adaptive Control, 1986.
DOI : 10.1137/1.9781611974263

R. Munos, Efficient resources allocation for markov decision processes, NIPS, pp.1571-1578, 2001.

R. Munos and A. Moore, Rates of convergence for variable resolution schemes in optimal control, International Conference on Machine Learning, 2000.

M. Puterman, Markov Decision Processes, 1994.
DOI : 10.1002/9780470316887

R. S. Sutton, Dyna, an integrated architecture for learning, planning, and reacting, Working Notes of the AAAI Spring Symposium on Integrated Intelligent Architectures, 1991.
DOI : 10.1145/122344.122377

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

G. Tesauro, Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, pp.58-68, 1995.
DOI : 10.1145/203330.203343