A. Banerjee and A. Tsiatis, Adaptive two-stage designs in phase II clinical trials, Statistics in Medicine, vol.5, issue.19, pp.3382-3395, 2006.
DOI : 10.1002/sim.2501

T. Ba?ar and P. Bernhard, H ? -optimal control and related minimax design problems : a dynamic game approach, 1995.
DOI : 10.1007/978-0-8176-4757-5

A. Bemporad and M. Morari, Robust model predictive control: A survey, Robustness in Identification and Control, pp.207-226, 1999.
DOI : 10.1007/BFb0109870

D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, 1996.
DOI : 10.1007/0-306-48332-7_333

J. Birge and F. Louveaux, Introduction to Stochastic Programming, 1997.
DOI : 10.1007/978-1-4614-0237-4

S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.
DOI : 10.1007/978-0-585-33656-5_4

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857

L. Busoniu, R. Babuska, D. Schutter, B. Ernst, and D. , Reinforcement Learning and Dynamic Programming using Function Approximators, 2010.
DOI : 10.1201/9781439821091

URL : http://orbi.ulg.ac.be/jspui/handle/2268/27963

E. Camacho and C. Bordons, Model Predictive Control, 2004.
DOI : 10.1002/oca.2167

URL : https://hal.archives-ouvertes.fr/hal-00683813

A. Conn, N. Gould, and P. Toint, Trust-region Methods, Society for Industrial Mathematics, vol.1, 2000.
DOI : 10.1137/1.9780898719857

K. Darby-dowman, S. Barker, E. Audsley, and D. Parsons, A two-stage stochastic programming with recourse model for determining robust planting plans in horticulture, Journal of the Operational Research Society, pp.83-89, 2000.

B. Defourny, D. Ernst, and L. Wehenkel, Risk-aware decision making and dynamic programming, Selected for oral presentation at the NIPS-08 Workshop on Model Uncertainty and Risk in Reinforcement Learning, 2008.

E. Delage and S. Mannor, Percentile Optimization for Markov Decision Processes with Parameter Uncertainty, Operations Research, vol.58, issue.1, pp.203-213, 2010.
DOI : 10.1287/opre.1080.0685

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

D. Ernst, M. Glavic, F. Capitanescu, and L. Wehenkel, Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol.39, issue.2, pp.517-529, 2009.
DOI : 10.1109/TSMCB.2008.2007630

R. Fonteneau, Contributions to Batch Mode Reinforcement Learning, 2011.

R. Fonteneau, D. Ernst, B. Boigelot, and Q. Louveaux, Min max generalization for deterministic batch mode reinforcement learning : relaxation schemes. Arxiv preprint arXiv, pp.1202-5298, 2012.

R. Fonteneau, S. Murphy, L. Wehenkel, and D. Ernst, Inferring bounds on the performance of a control policy from a sample of trajectories, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009.
DOI : 10.1109/ADPRL.2009.4927534

R. Fonteneau, S. Murphy, L. Wehenkel, and D. Ernst, A cautious approach to generalization in reinforcement learning, Proceedings of the Second International Conference on Agents and Artificial Intelligence, 2010.

R. Fonteneau, S. A. Murphy, L. Wehenkel, and D. Ernst, Computing bounds for kernel-based policy evaluation in reinforcement learning, 2010.

R. Fonteneau, S. A. Murphy, L. Wehenkel, and D. Ernst, Towards Min Max Generalization in Reinforcement Learning, Agents and Artificial Intelligence : International Conference Revised Selected Papers. Series : Communications in Computer and Information Science (CCIS), pp.61-77, 2010.
DOI : 10.1109/TIT.1967.1054010

K. Frauendorfer, Stochastic Two-stage Programming, 1992.
DOI : 10.1007/978-3-642-95696-6

L. Hansen and T. Sargent, Robust control and model uncertainty, American Economic Review, pp.60-66, 2001.

J. Hiriart-urruty and C. Lemaréchal, Convex Analysis and Minimization Algorithms : Fundamentals, 1996.
DOI : 10.1007/978-3-662-02796-7

J. Ingersoll, Theory of Financial Decision Making, 1987.

S. Koenig, Minimax real-time heuristic search, Artificial Intelligence, vol.129, issue.1-2, pp.165-197, 2001.
DOI : 10.1016/S0004-3702(01)00103-5

URL : http://doi.org/10.1016/s0004-3702(01)00103-5

M. Lagoudakis and R. Parr, Least-squares policy iteration, Jounal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, Proceedings of the Eleventh International Conference on Machine Learning (ICML 1994), 1994.
DOI : 10.1016/B978-1-55860-335-6.50027-1

M. L. Littman, A tutorial on partially observable Markov decision processes, Journal of Mathematical Psychology, vol.53, issue.3, pp.119-125, 2009.
DOI : 10.1016/j.jmp.2009.01.005

Y. Lokhnygina and A. Tsiatis, Optimal two-stage group-sequential designs, Journal of Statistical Planning and Inference, vol.138, issue.2, pp.489-499, 2008.
DOI : 10.1016/j.jspi.2007.06.011

J. Lunceford, M. Davidian, and A. Tsiatis, Estimation of Survival Distributions of Treatment Policies in Two-Stage Randomization Designs in Clinical Trials, Biometrics, vol.84, issue.1, pp.48-57, 2002.
DOI : 10.1111/j.0006-341X.2002.00048.x

S. Mannor, D. Simester, P. Sun, and J. Tsitsiklis, Bias and variance in value function estimation, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015402

S. Murphy, Optimal dynamic treatment regimes, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.34, issue.2, pp.331-366, 2003.
DOI : 10.1016/0270-0255(86)90088-6

S. Murphy, An experimental design for the development of adaptive treatment strategies, Statistics in Medicine, vol.26, issue.10, pp.1455-1481, 2005.
DOI : 10.1002/sim.2022

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277

URL : https://hal.archives-ouvertes.fr/hal-00976649

D. Ormoneit and S. Sen, Kernel-based reinforcement learning, Machine Learning, pp.161-178, 2002.

C. Paduraru, D. Precup, and J. Pineau, A Framework for Computing Bounds for the Return of a Policy, Ninth European Workshop on Reinforcement Learning (EWRL9), 2011.
DOI : 10.1007/978-3-642-29946-9_21

M. Qian and S. Murphy, Performance Guarantees for Individualized Treatment Rules. Rapport interne 498, 2009.

M. Riedmiller, Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, Proceedings of the Sixteenth European Conference on Machine Learning, pp.317-328, 2005.
DOI : 10.1007/11564096_32

M. Rovatous and M. Lagoudakis, Minimax Search and Reinforcement Learning for Adversarial Tetris, Proceedings of the 6th Hellenic Conference on Artificial Intelligence (SETN'10), 2010.
DOI : 10.1007/978-3-642-12842-4_53

P. Scokaert and D. Mayne, Min-max feedback model predictive control for constrained linear systems, IEEE Transactions on Automatic Control, vol.43, issue.8, pp.43-1136, 1998.
DOI : 10.1109/9.704989

A. Shapiro, A dynamic programming approach to adjustable robust optimization, Operations Research Letters, vol.39, issue.2, pp.83-87, 2011.
DOI : 10.1016/j.orl.2011.01.001

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.414.2516

A. Shapiro, Minimax and risk averse multistage stochastic programming, European Journal of Operational Research, vol.219, issue.3, 2011.
DOI : 10.1016/j.ejor.2011.11.005

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.416.3788

J. Sturm, Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization methods and software, pp.625-653, 1999.
DOI : 10.1080/10556789908805766

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.6954

A. Wahed and A. Tsiatis, Optimal Estimator for the Survival Distribution and Related Quantities for Treatment Policies in Two-Stage Randomization Designs in Clinical Trials, Biometrics, vol.62, issue.1, pp.124-133, 2004.
DOI : 10.1056/NEJM199506223322503