Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1996. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
How to Lose at Tetris, The Mathematical Gazette, vol.81, issue.491, pp.194-200, 1997. ,
DOI : 10.2307/3619195
Tetris is Hard, Even to Approximate, Proceedings of the Ninth International Computing and Combinatorics Conference, pp.351-363, 2003. ,
DOI : 10.1007/3-540-45071-8_36
Tetris: A Study of Randomized Constraint Sampling, 2006. ,
DOI : 10.1007/1-84628-095-8_6
Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes, Journal of Artificial Intelligence Research, vol.25, pp.75-118, 2006. ,
A unifying perspective of parametric policy search methods for Markov decision processes, Proceedings of the Advances in Neural Information Processing Systems, pp.2726-2734, 2012. ,
Classification-based policy iteration with a critic, Proceedings of ICML, pp.1049-1056, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00590972
Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation, vol.9, issue.2, pp.159-195, 2001. ,
DOI : 10.1016/0004-3702(95)00124-7
A natural policy gradient, Proceedings of the Advances in Neural Information Processing Systems, pp.1531-1538, 2001. ,
Reinforcement Learning as Classification: Leveraging Modern Classifiers, Proceedings of ICML, pp.424-431, 2003. ,
Analysis of a Classification-based Policy Iteration Algorithm, Proceedings of ICML, pp.607-614, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482065
Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Management Science, vol.24, issue.11, 1978. ,
DOI : 10.1287/mnsc.24.11.1127
The cross-entropy method: A unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, 2004. ,
Performance Bounds for ?-Policy Iteration and Application to the Game of Tetris, Journal of Machine Learning Research, vol.14, pp.1175-1221, 2013. ,
URL : https://hal.archives-ouvertes.fr/inria-00185271
Approximate modified policy iteration, Proceedings of ICML, pp.1207-1214, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758882
Learning Tetris Using the Noisy Cross-Entropy Method, Neural Computation, vol.18, issue.12, pp.2936-2941, 2006. ,
DOI : 10.1007/s10479-005-5732-z
Building Controllers for Tetris, ICGA Journal, vol.32, issue.1, pp.3-11, 2009. ,
DOI : 10.3233/ICG-2009-32102
URL : https://hal.archives-ouvertes.fr/inria-00418954
Improvements on Learning Tetris with Cross Entropy, ICGA Journal, vol.32, issue.1, 2009. ,
DOI : 10.3233/ICG-2009-32104
URL : https://hal.archives-ouvertes.fr/inria-00418930
MDPTetris features documentation, 2010. ,
Feature-based methods for large scale dynamic programming, Machine Learning, pp.59-94, 1996. ,