D. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1996.

D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

H. Burgiel, How to Lose at Tetris, The Mathematical Gazette, vol.81, issue.491, pp.194-200, 1997.
DOI : 10.2307/3619195

E. Demaine, S. Hohenberger, and D. Liben-nowell, Tetris is Hard, Even to Approximate, Proceedings of the Ninth International Computing and Combinatorics Conference, pp.351-363, 2003.
DOI : 10.1007/3-540-45071-8_36

V. Farias and B. Van-roy, Tetris: A Study of Randomized Constraint Sampling, 2006.
DOI : 10.1007/1-84628-095-8_6

A. Fern, S. Yoon, and R. Givan, Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes, Journal of Artificial Intelligence Research, vol.25, pp.75-118, 2006.

T. Furmston and D. Barber, A unifying perspective of parametric policy search methods for Markov decision processes, Proceedings of the Advances in Neural Information Processing Systems, pp.2726-2734, 2012.

V. Gabillon, A. Lazaric, M. Ghavamzadeh, and B. Scherrer, Classification-based policy iteration with a critic, Proceedings of ICML, pp.1049-1056, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00590972

N. Hansen and A. Ostermeier, Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation, vol.9, issue.2, pp.159-195, 2001.
DOI : 10.1016/0004-3702(95)00124-7

S. Kakade, A natural policy gradient, Proceedings of the Advances in Neural Information Processing Systems, pp.1531-1538, 2001.

M. Lagoudakis and R. Parr, Reinforcement Learning as Classification: Leveraging Modern Classifiers, Proceedings of ICML, pp.424-431, 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a Classification-based Policy Iteration Algorithm, Proceedings of ICML, pp.607-614, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

M. Puterman and M. Shin, Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Management Science, vol.24, issue.11, 1978.
DOI : 10.1287/mnsc.24.11.1127

R. Rubinstein and D. Kroese, The cross-entropy method: A unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, 2004.

B. Scherrer, Performance Bounds for ?-Policy Iteration and Application to the Game of Tetris, Journal of Machine Learning Research, vol.14, pp.1175-1221, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00185271

B. Scherrer, M. Ghavamzadeh, V. Gabillon, and M. Geist, Approximate modified policy iteration, Proceedings of ICML, pp.1207-1214, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882

I. Szita and A. L?, Learning Tetris Using the Noisy Cross-Entropy Method, Neural Computation, vol.18, issue.12, pp.2936-2941, 2006.
DOI : 10.1007/s10479-005-5732-z

C. Thiery and B. Scherrer, Building Controllers for Tetris, ICGA Journal, vol.32, issue.1, pp.3-11, 2009.
DOI : 10.3233/ICG-2009-32102
URL : https://hal.archives-ouvertes.fr/inria-00418954

C. Thiery and B. Scherrer, Improvements on Learning Tetris with Cross Entropy, ICGA Journal, vol.32, issue.1, 2009.
DOI : 10.3233/ICG-2009-32104
URL : https://hal.archives-ouvertes.fr/inria-00418930

C. Thiery and B. Scherrer, MDPTetris features documentation, 2010.

J. Tsitsiklis and . Van-roy, Feature-based methods for large scale dynamic programming, Machine Learning, pp.59-94, 1996.