Planetary rover control as a markov decision process, AAAI Spring Symposium: Game Theoretic and Decision Theoretic Agents, 2001. ,
Action elimination and stopping conditions for reinforcement learning, ICML, pp.162-169, 2003. ,
Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963. ,
DOI : 10.1214/aoms/1177730491
Dynamic programming and Markov processes, 1960. ,
Efficient reinforcement learning in factored MDPs, Seventeenth International Joint Conference on Artificial Intelligence, 1999. ,
Near-optimal reinforcement learning in polynomial time, Proc. 15th International Conf. on Machine Learning, pp.260-268, 1998. ,
A Survey of Some Results in Stochastic Adaptive Control, SIAM Journal on Control and Optimization, vol.23, issue.3, pp.329-367, 1985. ,
DOI : 10.1137/0323023
Stochastic Systems: Estimation, Identification , and Adaptive Control, 1986. ,
DOI : 10.1137/1.9781611974263
Efficient resources allocation for markov decision processes, NIPS, pp.1571-1578, 2001. ,
Rates of convergence for variable resolution schemes in optimal control, International Conference on Machine Learning, 2000. ,
Markov Decision Processes, 1994. ,
DOI : 10.1002/9780470316887
Dyna, an integrated architecture for learning, planning, and reacting, Working Notes of the AAAI Spring Symposium on Integrated Intelligent Architectures, 1991. ,
DOI : 10.1145/122344.122377
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, pp.58-68, 1995. ,
DOI : 10.1145/203330.203343