A. Anand, A. Grover, P. Singla, and E. T. Al, Asap-uct: Abstraction of state-action pairs in uct, Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.

, A novel abstraction framework for online planning, Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp.1901-1902, 2015.

S. Arora, E. Hazan, and A. S. Kale, The multiplicative weights update method: a meta-algorithm and applications, Theory of Computing, vol.8, pp.121-164, 2012.

P. Auer, N. Cesa-bianchi, Y. Freund, and A. R. Schapire, The nonstochastic multiarmed bandit problem, SIAM journal on computing, vol.32, pp.48-77, 2002.

M. Balandat, W. Krichene, C. Tomlin, and A. A. Bayen, Minimizing regret on reflexive banach spaces and learning nash equilibria in continuous zero-sum games, 2016.

D. Bertsimas-and-n and . Kallus, From predictive to prescriptive analytics, 2014.

S. Bervoets, M. Bravo, and A. M. Faure, Learning with minimal information in continuous games, 2018.

D. Bloembergen, K. Tuyls, D. Hennes, and A. M. Kaisers, Evolutionary dynamics of multi-agent learning: a survey, Journal of Artificial Intelligence Research, vol.53, pp.659-697, 2015.

A. Blum, On-line algorithms in machine learning, Online algorithms, pp.306-325, 1998.

A. Blum and Y. Mansour, From external to internal regret, Journal of Machine Learning Research, vol.8, pp.1307-1324, 2007.
DOI : 10.1007/11503415_42

M. Bravo, D. S. Leslie, and A. P. Mertikopoulos, Bandit learning in concave N-person games, NIPS '18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01891523

S. Bubeck, N. Cesa-bianchi, and E. T. Al, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and Trends R in Machine Learning, vol.5, pp.1-122, 2012.

L. Busoniu, R. Babuska, and A. B. De-schutter, Multi-agent reinforcement learning: An overview, in Innovations in multi-agent systems and applications-1, pp.183-221, 2010.

N. and C. Lugosi, Prediction, learning, and games, 2006.

J. Cohen, A. Héliou, and A. P. Mertikopoulos, Learning with bandit feedback in potential games, NIPS '17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01643352

P. De, J. Souza-and, and . Silva, Berkeley Problems in Mathematics, 2012.

M. Dimakopoulou and B. Van-roy, Coordinated exploration in concurrent reinforcement learning, 2018.

D. Fudenberg-and and D. K. Levine, of Economic learning and social evolution, The Theory of Learning in Games, vol.2, 1998.

I. Goodfellow, Y. Bengio, A. A. Courville, and D. Learning, Adaptive computation and machine learning, 2016.

A. Grover, M. Al-shedivat, J. K. Gupta, Y. Burda, and A. H. Edwards, Evaluating generalization in multiagent systems using agent-interaction graphs, International Conference on Autonomous Agents and Multiagent Systems, 2018.

A. Grover, M. Al-shedivat, J. K. Gupta, Y. Burda, and A. H. Edwards, Learning policy representations in multiagent systems, International Conference on Machine Learning, 2018.

A. Grover, T. Markov, P. Attia, N. Jin, N. Perkins et al., Best arm identification in multi-armed bandits with delayed feedback, 2018.

E. Hazan, Introduction to Online Convex Optimization, Foundations and Trends(r) in Optimization Series, 2016.
DOI : 10.1561/2400000013

E. Hazan, A. Agarwal, and A. S. Kale, Logarithmic regret algorithms for online convex optimization, Machine Learning, vol.69, pp.169-192, 2007.
DOI : 10.1007/s10994-007-5016-8

P. Joulani, A. Gyorgy, and A. C. Szepesvári, Online learning under delayed feedback, International Conference on Machine Learning, pp.1453-1461, 2013.

A. Kalai-and-s and . Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.71, pp.291-307, 2005.

S. Krichene, W. Krichene, R. Dong, and A. A. Bayen, Convergence of heterogeneous distributed learning in stochastic routing games, Communication, Control, and Computing (Allerton), 2015 53rd Annual Allerton Conference on, pp.480-487, 2015.

K. Lam, W. Krichene, and A. A. Bayen, On learning how players learn: estimation of learning dynamics in the routing game, Cyber-Physical Systems (ICCPS), 2016 ACM/IEEE 7th International Conference on, pp.1-10, 2016.

I. Menache-and-a and . Ozdaglar, Network games: Theory, models, and dynamics, Synthesis Lectures on Communication Networks, vol.4, pp.1-159, 2011.

P. Mertikopoulos, E. V. Belmega, R. Negrel, and A. L. Sanguinetti, Distributed stochastic optimization via matrix exponential learning, IEEE Trans. Signal Process, vol.65, pp.2277-2290, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01382285

P. Mertikopoulos, C. Papadimitriou, and A. G. Piliouras, Cycles in adversarial regularized learning, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, pp.2703-2717, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01643338

P. Mertikopoulos and Z. Zhou, Learning in games with continuous action sets and unknown payoff functions, Mathematical Programming, pp.1-43, 2018.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou et al., Playing atari with deep reinforcement learning, 2013.

B. Monnot and G. Piliouras, Limits and limitations of no-regret learning in games, The Knowledge Engineering Review, p.32, 2017.

Y. Nesterov, Primal-dual subgradient methods for convex problems, Mathematical programming, vol.120, pp.221-259, 2009.

G. Palaiopanos, I. Panageas, and A. G. Piliouras, Multiplicative weights update with constant step-size in congestion games: Convergence, limit cycles and chaos, Advances in Neural Information Processing Systems, vol.30, pp.5872-5882, 2017.

S. Perkins, P. Mertikopoulos, and A. D. Leslie, Mixed-strategy learning with continuous action sets, IEEE Trans. Autom. Control, vol.62, pp.379-384, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01382280

C. , S. Agrawal, C. Szepesvari, and A. S. Grunewalder, Bandits with delayed, aggregated anonymous feedback, International Conference on Machine Learning, pp.4102-4110, 2018.

K. Quanrud-and-d and . Khashabi, Online learning with adversarial delays, Advances in Neural Information Processing Systems, pp.1270-1278, 2015.

T. Roughgarden, Selfish routing and the price of anarchy, vol.174

F. Salehisadaghiani-and-l and . Pavel, Distributed nash equilibrium seeking via the alternating direction method of multipliers, IFAC-PapersOnLine, vol.50, pp.6166-6171, 2017.

S. Shalev-shwartz, Online learning: Theory, algorithms, and applications, 2007.

S. , Online learning and online convex optimization, Foundations and Trends R in Machine Learning, vol.4, pp.107-194, 2012.

S. Singer, Advances in Neural Information Processing Systems, vol.19, pp.1265-1272, 2007.

Y. Shoham and K. Leyton-brown, Multiagent systems: Algorithmic, game-theoretic, and logical foundations, 2008.

Y. Viossat-and-a and . Zapechelnyuk, No-regret dynamics and fictitious play, Journal of Economic Theory, vol.148, pp.825-842, 2013.

Y. Viossat-and-a and . Zapechelnyuk, No-regret dynamics and fictitious play, Journal of Economic Theory, vol.148, pp.825-842, 2013.

Z. Yin, K. Chang, A. R. Zhang, and D. , Information directed sequence understanding and chatbot design via recurrent neural networks, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.2131-2139, 2017.

Z. Yin, V. Sachidananda, and A. B. Prabhakar, The global anchor method for quantifying linguistic shifts and domain adaptation, Advances in Neural Information Processing Systems, 2018.

Z. Yin and Y. Shen, On the dimensionality of word embedding, Advances in Neural Information Processing Systems, 2018.

Z. Zhou, S. Athey, and A. S. Wager, Offline multi-action policy learning: Generalization and optimization, 2018.

Z. Zhou, N. Bambos, and A. P. Glynn, Dynamics on linear influence network games under stochastic environments, International Conference on Decision and Game Theory for Security, pp.114-126, 2016.
DOI : 10.1007/978-3-319-47413-7_7

Z. Zhou, P. Mertikopoulos, N. Bambos, P. W. Glynn, and A. C. Tomlin, Countering feedback delays in multi-agent learning, NIPS '17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01643350

Z. Zhou, P. Mertikopoulos, A. L. Moustakas, N. Bambos, and A. P. Glynn, Mirror descent learning in continuous games, Decision and Control (CDC), 2017 IEEE 56th Annual Conference on, pp.5776-5783, 2017.
DOI : 10.1109/cdc.2017.8264532

URL : https://hal.archives-ouvertes.fr/hal-01643341

Z. Zhou, B. Yolken, R. A. Miura-ko, and A. N. Bambos, A game-theoretical formulation of influence networks, American Control Conference (ACC), pp.3802-3807, 2016.
DOI : 10.1109/acc.2016.7525505

M. Zhu-and-e and . Frazzoli, Distributed robust adaptive equilibrium computation for generalized convex games, Automatica, vol.63, pp.82-91, 2016.

M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, ICML '03: Proceedings of the 20th International Conference on Machine Learning, pp.928-936, 2003.