P. Auer, N. Cesa-bianchi, Y. Freund, and R. E. Schapire, Gambling in a rigged casino: The adversarial multi-armed bandit problem, Proceedings of IEEE 36th Annual Foundations of Computer Science, pp.322-331, 1995.
DOI : 10.1109/SFCS.1995.492488

P. Auer, N. Cesa-bianchi, Y. Freund, and R. E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, 2002.
DOI : 10.1137/S0097539701398375

J. Poland, Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments, Theoretical Computer Science, vol.397, issue.1-3, pp.77-93, 2008.
DOI : 10.1016/j.tcs.2008.02.024

URL : http://doi.org/10.1016/j.tcs.2008.02.024

V. Dani, T. Hayes, and S. Kakade, The price of bandit information for online optimization, Advances in Neural Information Processing Systems, pp.345-352, 2008.

J. Abernethy, E. Hazan, and A. Rakhlin, Competing in the dark: An efficient algorithm for bandit linear optimization, pp.263-274, 2008.

N. Cesa-bianchi and G. Lugosi, Combinatorial bandits, Conference on Computational Learning Theory, 2009.
DOI : 10.1016/j.jcss.2012.01.001

P. Auer, Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, pp.397-422, 2002.

V. Dani, T. P. Hayes, and S. M. Kakade, Stochastic linear optimization under bandit feedback, In: In submission, 2008.

M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, In: ICML, pp.928-936, 2003.

E. Hazan, A. Agarwal, and S. Kale, Logarithmic regret algorithms for online convex optimization, In: In COLT, pp.499-513, 2006.

P. Bartlett, E. Hazan, and A. Rakhlin, Adaptive online gradient descent, 2007.

S. Shalev-shwartz, Online Learning, 2007.
DOI : 10.1017/CBO9781107298019.022

J. D. Abernethy, P. Bartlett, A. Rakhlin, and A. Tewari, Optimal strategies and minimax lower bounds for online convex games, 2008.

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandit problems in metric spaces, Proceedings of the 40th ACM Symposium on Theory of Computing, pp.681-690, 2008.

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, Online optimization of X-armed bandits, In: NIPS, 2008.

N. Littlestone and M. Warmuth, The Weighted Majority Algorithm, Information and Computation, vol.108, issue.2, pp.212-261, 1994.
DOI : 10.1006/inco.1994.1009

N. Cesa-bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. Shapire et al., How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997.
DOI : 10.1145/258128.258179

P. Auer, N. Cesa-bianchi, and C. Gentile, Adaptive and Self-Confident On-Line Learning Algorithms, Journal of Computer and System Sciences, vol.64, issue.1, 2000.
DOI : 10.1006/jcss.2001.1795

URL : http://doi.org/10.1006/jcss.2001.1795

G. Stoltz, Incomplete information and internal regret in prediction of individual sequences, 2005.
URL : https://hal.archives-ouvertes.fr/tel-00009759

W. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice, 1996.

C. Andrieu, N. De-freitas, A. Doucet, and M. Jordan, An introduction to mcmc for machine learning, Machine Learning, vol.50, issue.1/2, pp.5-43, 2003.
DOI : 10.1023/A:1020281327116

D. A. Levin, Y. Peres, and E. L. Wilmer, Markov Chains and Mixing Times, 2008.
DOI : 10.1090/mbk/058

R. Douc, A. Guillin, J. Marin, and C. Robert, Minimum variance importance sampling via population monte carlo, Esaim P&S, vol.11, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00070316

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, 1996.
DOI : 10.1007/978-1-4612-0711-5