Asap-uct: Abstraction of state-action pairs in uct, Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. ,

, A novel abstraction framework for online planning, Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp.1901-1902, 2015.

The multiplicative weights update method: a meta-algorithm and applications, Theory of Computing, vol.8, pp.121-164, 2012. ,

The nonstochastic multiarmed bandit problem, SIAM journal on computing, vol.32, pp.48-77, 2002. ,

, Minimizing regret on reflexive banach spaces and learning nash equilibria in continuous zero-sum games, 2016.

, From predictive to prescriptive analytics, 2014.

Learning with minimal information in continuous games, 2018. ,

Evolutionary dynamics of multi-agent learning: a survey, Journal of Artificial Intelligence Research, vol.53, pp.659-697, 2015. ,

On-line algorithms in machine learning, Online algorithms, pp.306-325, 1998. ,

From external to internal regret, Journal of Machine Learning Research, vol.8, pp.1307-1324, 2007. ,

DOI : 10.1007/11503415_42

Bandit learning in concave N-person games, NIPS '18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018. ,

URL : https://hal.archives-ouvertes.fr/hal-01891523

Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and Trends R in Machine Learning, vol.5, pp.1-122, 2012. ,

Multi-agent reinforcement learning: An overview, in Innovations in multi-agent systems and applications-1, pp.183-221, 2010. ,

, Prediction, learning, and games, 2006.

Learning with bandit feedback in potential games, NIPS '17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01643352

, Berkeley Problems in Mathematics, 2012.

, Coordinated exploration in concurrent reinforcement learning, 2018.

of Economic learning and social evolution, The Theory of Learning in Games, vol.2, 1998. ,

Adaptive computation and machine learning, 2016. ,

Evaluating generalization in multiagent systems using agent-interaction graphs, International Conference on Autonomous Agents and Multiagent Systems, 2018. ,

Learning policy representations in multiagent systems, International Conference on Machine Learning, 2018. ,

, Best arm identification in multi-armed bandits with delayed feedback, 2018.

Introduction to Online Convex Optimization, Foundations and Trends(r) in Optimization Series, 2016. ,

DOI : 10.1561/2400000013

Logarithmic regret algorithms for online convex optimization, Machine Learning, vol.69, pp.169-192, 2007. ,

DOI : 10.1007/s10994-007-5016-8

Online learning under delayed feedback, International Conference on Machine Learning, pp.1453-1461, 2013. ,

Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.71, pp.291-307, 2005. ,

Convergence of heterogeneous distributed learning in stochastic routing games, Communication, Control, and Computing (Allerton), 2015 53rd Annual Allerton Conference on, pp.480-487, 2015. ,

On learning how players learn: estimation of learning dynamics in the routing game, Cyber-Physical Systems (ICCPS), 2016 ACM/IEEE 7th International Conference on, pp.1-10, 2016. ,

Network games: Theory, models, and dynamics, Synthesis Lectures on Communication Networks, vol.4, pp.1-159, 2011. ,

Distributed stochastic optimization via matrix exponential learning, IEEE Trans. Signal Process, vol.65, pp.2277-2290, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01382285

Cycles in adversarial regularized learning, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, pp.2703-2717, 2018. ,

URL : https://hal.archives-ouvertes.fr/hal-01643338

Learning in games with continuous action sets and unknown payoff functions, Mathematical Programming, pp.1-43, 2018. ,

, Playing atari with deep reinforcement learning, 2013.

Limits and limitations of no-regret learning in games, The Knowledge Engineering Review, p.32, 2017. ,

Primal-dual subgradient methods for convex problems, Mathematical programming, vol.120, pp.221-259, 2009. ,

Multiplicative weights update with constant step-size in congestion games: Convergence, limit cycles and chaos, Advances in Neural Information Processing Systems, vol.30, pp.5872-5882, 2017. ,

Mixed-strategy learning with continuous action sets, IEEE Trans. Autom. Control, vol.62, pp.379-384, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01382280

Bandits with delayed, aggregated anonymous feedback, International Conference on Machine Learning, pp.4102-4110, 2018. ,

Online learning with adversarial delays, Advances in Neural Information Processing Systems, pp.1270-1278, 2015. ,

Selfish routing and the price of anarchy, vol.174 ,

Distributed nash equilibrium seeking via the alternating direction method of multipliers, IFAC-PapersOnLine, vol.50, pp.6166-6171, 2017. ,

, Online learning: Theory, algorithms, and applications, 2007.

Online learning and online convex optimization, Foundations and Trends R in Machine Learning, vol.4, pp.107-194, 2012. ,

, Advances in Neural Information Processing Systems, vol.19, pp.1265-1272, 2007.

, Multiagent systems: Algorithmic, game-theoretic, and logical foundations, 2008.

No-regret dynamics and fictitious play, Journal of Economic Theory, vol.148, pp.825-842, 2013. ,

No-regret dynamics and fictitious play, Journal of Economic Theory, vol.148, pp.825-842, 2013. ,

Information directed sequence understanding and chatbot design via recurrent neural networks, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.2131-2139, 2017. ,

The global anchor method for quantifying linguistic shifts and domain adaptation, Advances in Neural Information Processing Systems, 2018. ,

On the dimensionality of word embedding, Advances in Neural Information Processing Systems, 2018. ,

Offline multi-action policy learning: Generalization and optimization, 2018. ,

Dynamics on linear influence network games under stochastic environments, International Conference on Decision and Game Theory for Security, pp.114-126, 2016. ,

DOI : 10.1007/978-3-319-47413-7_7

Countering feedback delays in multi-agent learning, NIPS '17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. ,

URL : https://hal.archives-ouvertes.fr/hal-01643350

Mirror descent learning in continuous games, Decision and Control (CDC), 2017 IEEE 56th Annual Conference on, pp.5776-5783, 2017. ,

DOI : 10.1109/cdc.2017.8264532

URL : https://hal.archives-ouvertes.fr/hal-01643341

A game-theoretical formulation of influence networks, American Control Conference (ACC), pp.3802-3807, 2016. ,

DOI : 10.1109/acc.2016.7525505

Distributed robust adaptive equilibrium computation for generalized convex games, Automatica, vol.63, pp.82-91, 2016. ,

Online convex programming and generalized infinitesimal gradient ascent, ICML '03: Proceedings of the 20th International Conference on Machine Learning, pp.928-936, 2003. ,