L. Devroye, G. Lugosi, and G. Neu, Prediction by random-walk perturbation, Proceedings of the 25th Annual Conference on Learning Theory, pp.460-473, 2013.

C. Gentile and M. Warmuth, Linear hinge loss and average margin, Advances in Neural Information Processing Systems (NIPS), pp.225-231, 1998.

J. Kivinen and M. Warmuth, Relative loss bounds for multidimensional regression problems, Machine Learning, pp.301-329, 2001.

A. Grove, N. Littlestone, and D. Schuurmans, General convergence results for linear discriminant updates, Proceedings of the tenth annual conference on Computational learning theory , COLT '97, pp.173-210, 2001.
DOI : 10.1145/267460.267493

E. Takimoto and M. Warmuth, Path Kernels and Multiplicative Updates, Journal of Machine Learning Research, vol.4, pp.773-818, 2003.
DOI : 10.1007/3-540-45435-7_6

A. Kalai and S. Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.71, issue.3, pp.291-307, 2005.
DOI : 10.1016/j.jcss.2004.10.016

M. Warmuth and D. Kuzmin, Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension, Journal of Machine Learning Research, vol.9, pp.2287-2320, 2008.

D. P. Helmbold and M. Warmuth, Learning Permutations with Exponential Weights, Journal of Machine Learning Research, vol.10, pp.1705-1736, 2009.
DOI : 10.1007/978-3-540-72927-3_34

E. Hazan, S. Kale, and M. Warmuth, Learning rotations with little regret, Proceedings of the 23rd Annual Conference on Learning Theory (COLT), pp.144-154, 2010.
DOI : 10.1007/s10994-016-5548-x

W. Koolen, M. Warmuth, and J. Kivinen, Hedging structured concepts, Proceedings of the 23rd Annual Conference on Learning Theory (COLT), pp.93-105, 2010.

N. Cesa-bianchi and G. Lugosi, Combinatorial bandits, Journal of Computer and System Sciences, vol.78, issue.5, pp.1404-1422, 2012.
DOI : 10.1016/j.jcss.2012.01.001

J. Y. Audibert, S. Bubeck, and G. Lugosi, Regret in Online Combinatorial Optimization, Mathematics of Operations Research, vol.39, issue.1, pp.31-45, 2014.
DOI : 10.1287/moor.2013.0598

N. Littlestone and M. Warmuth, The Weighted Majority Algorithm, Information and Computation, vol.108, issue.2, pp.212-261, 1994.
DOI : 10.1006/inco.1994.1009

V. Vovk, AGGREGATING STRATEGIES, Proceedings of the third annual workshop on Computational learning theory (COLT), pp.371-386, 1990.
DOI : 10.1016/B978-1-55860-146-8.50032-1

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

J. Hannan, Approximation to Bayes risk in repeated play, Contributions to the theory of games, pp.97-139, 1957.

M. Hutter and J. Poland, Prediction with Expert Advice by Following the Perturbed Leader for General Weights, ALT, pp.279-293, 2004.
DOI : 10.1007/978-3-540-30215-5_22

J. Poland, FPL Analysis for Adaptive Bandits, 3rd Symposium on Stochastic Algorithms, Foundations and Applications (SAGA'05), pp.58-69, 2005.
DOI : 10.1007/11571155_7

G. Neu and G. Bartók, An Efficient Algorithm for Learning with Semi-bandit Feedback, Proceedings of the 24th International Conference on Algorithmic Learning Theory, pp.234-248, 2013.
DOI : 10.1007/978-3-642-40935-6_17

D. Suehiro, K. Hatano, S. Kijima, E. Takimoto, and K. Nagano, Online Prediction under Submodular Constraints, Algorithmic Learning Theory, pp.260-274, 2012.
DOI : 10.1007/978-3-642-34106-9_22

S. Geulen, B. Voecking, and M. Winkler, Regret minimization for online buffering problems using the weighted majority algorithm, Proceedings of the 23rd Annual Conference on Learning Theory (COLT 2010), pp.132-143, 2010.

A. György and G. Neu, Near-optimal rates for limited-delay universal lossy source coding, Proceedings of the IEEE International Symposium on Information Theory, 2011.

E. Even-dar, S. M. Kakade, and Y. Mansour, Online Markov Decision Processes, Mathematics of Operations Research, vol.34, issue.3, pp.726-736, 2009.
DOI : 10.1287/moor.1090.0396

G. Neu, A. György, C. Szepesvári, and A. Antos, Online Markov Decision Processes Under Bandit Feedback, Advances in Neural Information Processing Systems, pp.1804-1812, 2011.
DOI : 10.1109/TAC.2013.2292137

URL : https://hal.archives-ouvertes.fr/hal-01079422

S. Rakhlin, O. Shamir, and K. Sridharan, Relax and randomize : From value to algorithms, Advances in Neural Information Processing Systems 25, pp.2150-2158, 2012.

W. Feller, An Introduction to Probability Theory and its Applications, 1968.

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities:A Nonasymptotic Theory of Independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794821

L. Devroye, Belgium) obtained his Ph.D. from the University of Texas in 1976, and joined the School of His research interests include probability theory as applied to the analysis of algorithms, mathematical statistics, machine learning, pattern recognition, and random number generation, Computer Science at McGill University in Montreal Canada, 1977.

G. Neu-received-his and M. Sc, degree in Electrical Engineering and his Ph.D. degree in Technical Informatics from the Budapest University of Technology and Ecomomics (Hungary) in 2008 and 2013, respectively. Since 2013, he is a postdoctoral fellow at the SequeL team of INRIA Lille ? Nord Europe, His research interests include reinforcement learning, online learning, and bandit problems