Minimax policies for adversarial and stochastic bandits, 22nd annual conference on learning theory, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00834882
Exploration-exploitation trade-off using variance estimates in multi-armed bandits, Theoretical Computer Science, 2008. ,
Finitetime analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, 2002. ,
DOI : 10.1137/S0097539701398375
From External to Internal Regret, In In COLT, pp.621-636, 2005. ,
DOI : 10.1007/11503415_42
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.5182
Bandits Games and Clustering Foundations, 2010. ,
URL : https://hal.archives-ouvertes.fr/tel-00845565
Potentialbased algorithms in on-line prediction and game theory, Machine Learning, vol.51, issue.3, pp.239-261, 2003. ,
DOI : 10.1023/A:1022901500417
Clinical data based optimal STI strategies for HIV: a reinforcement learning approach, Proceedings of the 45th IEEE Conference on Decision and Control, pp.65-72, 2006. ,
DOI : 10.1109/CDC.2006.377527
URL : https://hal.archives-ouvertes.fr/hal-00121732
Asymptotic calibration, Biometrika, vol.85, issue.2, pp.379-390, 1996. ,
DOI : 10.1093/biomet/85.2.379
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.8037
Regret in the on-line decision problem, Games and Economic Behavior, vol.29, issue.12, pp.7-35, 1999. ,
A decisiontheoretic generalization of on-line learning and an application to boosting, EuroCOLT '95: Proceedings of the Second European Conference on Computational Learning Theory, pp.23-37, 1995. ,
A Simple Adaptive Procedure Leading to Correlated Equilibrium, Econometrica, vol.68, issue.5, pp.1127-1150, 2000. ,
DOI : 10.1111/1468-0262.00153
Feature reinforcement learning: Part I: Unstructured MDPs Sleeping experts and bandits with stochastic action availability and adversarial rewards, AISTATS, pp.3-24, 2009. ,
Regret bounds for sleeping experts and bandits, Conference on Learning Theory, 2008. ,
DOI : 10.1007/s10994-010-5178-7
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.6257
Online regret bounds for markov decision processes with deterministic transitions, ALT '08: Proceedings of the 19th international conference on Algorithmic Learning Theory, pp.123-137, 2008. ,
DOI : 10.1016/j.tcs.2010.04.005
URL : http://doi.org/10.1016/j.tcs.2010.04.005
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
DOI : 10.1090/S0002-9904-1952-09620-8
On the possibility of learning in reactive environments with arbitrary dependence, Theoretical Computer Science, vol.405, issue.3, pp.274-284, 2008. ,
DOI : 10.1016/j.tcs.2008.06.039
URL : https://hal.archives-ouvertes.fr/hal-00639569
Incomplete information and internal regret in prediction of individual sequences, 2005. ,
URL : https://hal.archives-ouvertes.fr/tel-00009759