J. Audibert, S. Bubeck, and R. Munos, Best arm identification in multi-armed bandits, Proceedings of the Twenty-Third Annual Conference on Learning Theory, pp.41-53, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multi-armed bandit problem, Machine Learning, vol.47, pp.235-256, 2002.

S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in multi-armed bandit problems, Proceedings of the Twentieth International Conference on Algorithmic Learning Theory, pp.23-37, 2009.

S. Bubeck, T. Wang, and N. Viswanathan, Multiple identifications in multi-armed bandits. CoRR, abs/1205, p.3181, 2012.

K. Deng, J. Pineau, and S. Murphy, Active learning for developing personalized treatment, Proceedings of the Twenty-Seventh International Conference on Uncertainty in Artificial Intelligence, pp.161-168, 2011.

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems, Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006.

V. Gabillon, M. Ghavamzadeh, and A. Lazaric, Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00747005

V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck, Multi-bandit best arm identification, Proceedings of Advances in Neural Information Processing Systems 25, pp.2222-2230, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00632523

S. Kalyanakrishnan, Learning Methods for Sequential Decision Making with Imperfect Representations, 2011.

S. Kalyanakrishnan and P. Stone, Efficient selection of multiple bandit arms: Theory and practice, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.511-518, 2010.

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, Pac subset selection in stochastic multiarmed bandits, Proceedings of the Twentieth International Conference on Machine Learning, 2012.

O. Maron and A. Moore, Hoeffding races: Accelerating model selection search for classification and function approximation, Proceedings of Advances in Neural Information Processing Systems 6, pp.59-66, 1993.

A. Maurer and M. Pontil, Empirical bernstein bounds and sample-variance penalization, 22th annual conference on learning theory, 2009.

V. Mnih, C. Szepesvári, and J. Audibert, Empirical Bernstein stopping, Proceedings of the Twenty-Fifth International Conference on Machine Learning, pp.672-679, 2008.
DOI : 10.1145/1390156.1390241

URL : https://hal.archives-ouvertes.fr/hal-00834983

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematics Society, vol.58, pp.527-535, 1952.