A. Antos, V. Grover, and C. Szepesvári, Active learning in heteroscedastic noise, Theoretical Computer Science, vol.411, issue.29-30, pp.2712-2728, 2010.
DOI : 10.1016/j.tcs.2010.04.007

P. Artzner, . Delbaen, D. Eber, and . Heath, Coherent measures of risk, Mathematical finance, pp.1-24, 1996.

J. Audibert and S. Bubeck, Regret bounds and minimax policies under partial monitoring, Journal of Machine Learning Research, vol.11, pp.2785-2836, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654356

J. Audibert, S. Bubeck, and R. Munos, Best arm identification in multiarmed bandits, Proceedings of the Twenty-third Conference on Learning Theory (COLT'10), 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

J. Audibert, R. Munos, and C. Szepesvári, Exploration???exploitation tradeoff using variance estimates in multi-armed bandits, Theoretical Computer Science, vol.410, issue.19, pp.1876-1902, 2009.
DOI : 10.1016/j.tcs.2009.01.016

URL : https://hal.archives-ouvertes.fr/hal-00711069

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multi-armed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

D. B. Brown, Large deviations bounds for estimating conditional value-at-risk, Operations Research Letters, vol.35, issue.6, pp.722-730, 2007.
DOI : 10.1016/j.orl.2007.01.001

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

E. Even-dar, M. Kearns, and J. Wortman, Risk-Sensitive Online Learning, Proceedings of the 17th international conference on Algorithmic Learning Theory (ALT'06), pp.199-213, 2006.
DOI : 10.1007/11894841_18

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

C. Gollier, The Economics of Risk and Time, 2001.

H. Markowitz, PORTFOLIO SELECTION*, The Journal of Finance, vol.7, issue.1, pp.77-91, 1952.
DOI : 10.1111/j.1540-6261.1952.tb01525.x

P. Massart, The tight constant in the dvoretzky-kiefer-wolfowitz inequality. The Annals of Probability, pp.1269-1283, 1990.

J. Neumann and O. Morgenstern, Theory of games and economic behavior, 1947.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

A. Salomon and J. Audibert, Deviations of Stochastic Bandit Regret, Proceedings of the 22nd international conference on Algorithmic learning theory (ALT'11), pp.159-173, 2011.
DOI : 10.1007/978-3-642-24412-4_15

URL : https://hal.archives-ouvertes.fr/hal-00624461

A. Sani, A. Lazaric, and R. Munos, Risk-aversion in multi-arm bandit

K. Manfred, D. Warmuth, and . Kuzmin, Online variance minimization, Proceedings of the 19th Annual Conference on Learning Theory (COLT'06), pp.514-528, 2006.