N. Alon, N. Cesa-bianchi, C. Gentile, and Y. Mansour, From Bandits to Experts: A Tale of Domination and Independence, NIPS-25, pp.1610-1618, 2012.

N. Alon, N. Cesa-bianchi, C. Gentile, S. Mannor, Y. Mansour et al., Nonstochastic multi-armed bandits with graph-structured feedback. arXiv preprint, 2014.

J. Audibert and S. Bubeck, Minimax policies for adversarial and stochastic bandits, Proceedings of the 22nd Annual Conference on Learning Theory (COLT), 2009.
URL : https://hal.archives-ouvertes.fr/hal-00834882

J. Audibert and S. Bubeck, Regret bounds and minimax policies under partial monitoring, Journal of Machine Learning Research, vol.11, pp.2785-2836, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654356

P. Auer, N. Cesa-bianchi, Y. Freund, and R. E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

P. L. Bartlett, V. Dani, T. P. Hayes, S. Kakade, A. Rakhlin et al., High-probability regret bounds for bandit online linear optimization, COLT, pp.335-342, 2008.

A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire, Contextual bandit algorithms with supervised learning guarantees, AISTATS 2011, pp.19-26, 2011.

S. Bubeck, N. Cesa-bianchi, and S. M. Kakade, Towards minimax policies for online linear optimization with bandit feedback, 2012.

S. Bubeck and N. Cesa-bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundations and Trends?? in Machine Learning, vol.5, issue.1, 2012.
DOI : 10.1561/2200000024

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, 2006.
DOI : 10.1017/CBO9780511546921

N. Cesa-bianchi, P. Gaillard, G. Lugosi, and G. Stoltz, Mirror descent meets fixed share (and feels no regret), NIPS-25, pp.989-997
URL : https://hal.archives-ouvertes.fr/hal-00670514

D. A. Freedman, On tail probabilities for martingales. The Annals of Probability, pp.100-118, 1975.

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

J. Hannan, Approximation to Bayes risk in repeated play. Contributions to the theory of games, pp.97-139, 1957.

E. Hazan and S. Kale, Better Algorithms for Benign Bandits, The Journal of Machine Learning Research, vol.12, pp.1287-1311, 2011.
DOI : 10.1137/1.9781611973068.5

E. Hazan, Z. Karnin, and R. Meka, Volumetric spanners: an efficient exploration basis for learning, COLT, pp.408-422, 2014.

M. Herbster and M. Warmuth, Tracking the best expert, Machine Learning, pp.151-178, 1998.

A. Kalai and S. Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.71, issue.3, pp.291-307, 2005.
DOI : 10.1016/j.jcss.2004.10.016

T. Kocák, G. Neu, M. Valko, and R. Munos, Efficient learning by implicit exploration in bandit problems with side observations, NIPS-27, pp.613-621, 2014.

N. Littlestone and M. Warmuth, The Weighted Majority Algorithm, Information and Computation, vol.108, issue.2, pp.212-261, 1994.
DOI : 10.1006/inco.1994.1009

S. Mannor and O. Shamir, From Bandits to Experts: On the Value of Side-Observations, Neural Information Processing Systems, 2011.

H. B. Mcmahan and M. Streeter, Tighter bounds for multi-armed bandits with expert advice, COLT, 2009.

G. Neu, First-order regret bounds for combinatorial semi-bandits, COLT, pp.1360-1375, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01215001

A. Rakhlin and K. Sridharan, Online learning with predictable sequences, COLT, pp.993-1019, 2013.

Y. Seldin, N. Cesa-bianchi, P. Auer, F. Laviolette, and J. Shawe-taylor, PAC-Bayes-Bernstein inequality for martingales and its application to multiarmed bandits, Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, 2012.

V. Vovk, AGGREGATING STRATEGIES, Proceedings of the third annual workshop on Computational learning theory (COLT), pp.371-386, 1990.
DOI : 10.1016/B978-1-55860-146-8.50032-1