J. Audibert, R. Munos, and C. Szepesvári, Variance estimates and exploration function in multi-armed bandit, 2007.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, and J. Shawe-taylor, Exploration versus exploitation challenge, 2nd PASCAL Challenges Workshop, 2006.

J. C. Gittins, Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization, 1989.
DOI : 10.1002/9780470980033

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8
URL : http://doi.org/10.1016/0196-8858(85)90002-8

T. L. Lai and S. Yakowitz, Machine learning and nonparametric bandit theory, IEEE Transactions on Automatic Control, vol.40, pp.1199-1209, 1995.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

W. R. Thompson, ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES, Biometrika, vol.25, issue.3-4, pp.285-294, 1933.
DOI : 10.1093/biomet/25.3-4.285