Adaptive two-stage designs in phase II clinical trials, Statistics in Medicine, vol.5, issue.19, pp.3382-3395, 2006. ,
DOI : 10.1002/sim.2501
H ? -optimal control and related minimax design problems : a dynamic game approach, 1995. ,
DOI : 10.1007/978-0-8176-4757-5
Robust model predictive control: A survey, Robustness in Identification and Control, pp.207-226, 1999. ,
DOI : 10.1007/BFb0109870
Neuro-Dynamic Programming, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Introduction to Stochastic Programming, 1997. ,
DOI : 10.1007/978-1-4614-0237-4
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
DOI : 10.1007/978-0-585-33656-5_4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857
Reinforcement Learning and Dynamic Programming using Function Approximators, 2010. ,
DOI : 10.1201/9781439821091
URL : http://orbi.ulg.ac.be/jspui/handle/2268/27963
Model Predictive Control, 2004. ,
DOI : 10.1002/oca.2167
URL : https://hal.archives-ouvertes.fr/hal-00683813
Trust-region Methods, Society for Industrial Mathematics, vol.1, 2000. ,
DOI : 10.1137/1.9780898719857
A two-stage stochastic programming with recourse model for determining robust planting plans in horticulture, Journal of the Operational Research Society, pp.83-89, 2000. ,
Risk-aware decision making and dynamic programming, Selected for oral presentation at the NIPS-08 Workshop on Model Uncertainty and Risk in Reinforcement Learning, 2008. ,
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty, Operations Research, vol.58, issue.1, pp.203-213, 2010. ,
DOI : 10.1287/opre.1080.0685
Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol.39, issue.2, pp.517-529, 2009. ,
DOI : 10.1109/TSMCB.2008.2007630
Contributions to Batch Mode Reinforcement Learning, 2011. ,
Min max generalization for deterministic batch mode reinforcement learning : relaxation schemes. Arxiv preprint arXiv, pp.1202-5298, 2012. ,
Inferring bounds on the performance of a control policy from a sample of trajectories, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
DOI : 10.1109/ADPRL.2009.4927534
A cautious approach to generalization in reinforcement learning, Proceedings of the Second International Conference on Agents and Artificial Intelligence, 2010. ,
Computing bounds for kernel-based policy evaluation in reinforcement learning, 2010. ,
Towards Min Max Generalization in Reinforcement Learning, Agents and Artificial Intelligence : International Conference Revised Selected Papers. Series : Communications in Computer and Information Science (CCIS), pp.61-77, 2010. ,
DOI : 10.1109/TIT.1967.1054010
Stochastic Two-stage Programming, 1992. ,
DOI : 10.1007/978-3-642-95696-6
Robust control and model uncertainty, American Economic Review, pp.60-66, 2001. ,
Convex Analysis and Minimization Algorithms : Fundamentals, 1996. ,
DOI : 10.1007/978-3-662-02796-7
Theory of Financial Decision Making, 1987. ,
Minimax real-time heuristic search, Artificial Intelligence, vol.129, issue.1-2, pp.165-197, 2001. ,
DOI : 10.1016/S0004-3702(01)00103-5
URL : http://doi.org/10.1016/s0004-3702(01)00103-5
Least-squares policy iteration, Jounal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Markov games as a framework for multi-agent reinforcement learning, Proceedings of the Eleventh International Conference on Machine Learning (ICML 1994), 1994. ,
DOI : 10.1016/B978-1-55860-335-6.50027-1
A tutorial on partially observable Markov decision processes, Journal of Mathematical Psychology, vol.53, issue.3, pp.119-125, 2009. ,
DOI : 10.1016/j.jmp.2009.01.005
Optimal two-stage group-sequential designs, Journal of Statistical Planning and Inference, vol.138, issue.2, pp.489-499, 2008. ,
DOI : 10.1016/j.jspi.2007.06.011
Estimation of Survival Distributions of Treatment Policies in Two-Stage Randomization Designs in Clinical Trials, Biometrics, vol.84, issue.1, pp.48-57, 2002. ,
DOI : 10.1111/j.0006-341X.2002.00048.x
Bias and variance in value function estimation, Twenty-first international conference on Machine learning , ICML '04, 2004. ,
DOI : 10.1145/1015330.1015402
Optimal dynamic treatment regimes, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.34, issue.2, pp.331-366, 2003. ,
DOI : 10.1016/0270-0255(86)90088-6
An experimental design for the development of adaptive treatment strategies, Statistics in Medicine, vol.26, issue.10, pp.1455-1481, 2005. ,
DOI : 10.1002/sim.2022
Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009. ,
DOI : 10.1137/070704277
URL : https://hal.archives-ouvertes.fr/hal-00976649
Kernel-based reinforcement learning, Machine Learning, pp.161-178, 2002. ,
A Framework for Computing Bounds for the Return of a Policy, Ninth European Workshop on Reinforcement Learning (EWRL9), 2011. ,
DOI : 10.1007/978-3-642-29946-9_21
Performance Guarantees for Individualized Treatment Rules. Rapport interne 498, 2009. ,
Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, Proceedings of the Sixteenth European Conference on Machine Learning, pp.317-328, 2005. ,
DOI : 10.1007/11564096_32
Minimax Search and Reinforcement Learning for Adversarial Tetris, Proceedings of the 6th Hellenic Conference on Artificial Intelligence (SETN'10), 2010. ,
DOI : 10.1007/978-3-642-12842-4_53
Min-max feedback model predictive control for constrained linear systems, IEEE Transactions on Automatic Control, vol.43, issue.8, pp.43-1136, 1998. ,
DOI : 10.1109/9.704989
A dynamic programming approach to adjustable robust optimization, Operations Research Letters, vol.39, issue.2, pp.83-87, 2011. ,
DOI : 10.1016/j.orl.2011.01.001
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.414.2516
Minimax and risk averse multistage stochastic programming, European Journal of Operational Research, vol.219, issue.3, 2011. ,
DOI : 10.1016/j.ejor.2011.11.005
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.416.3788
Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization methods and software, pp.625-653, 1999. ,
DOI : 10.1080/10556789908805766
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.6954
Optimal Estimator for the Survival Distribution and Related Quantities for Treatment Policies in Two-Stage Randomization Designs in Clinical Trials, Biometrics, vol.62, issue.1, pp.124-133, 2004. ,
DOI : 10.1056/NEJM199506223322503