Asap-uct: Abstraction of state-action pairs in uct, Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. ,
, A novel abstraction framework for online planning, Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp.1901-1902, 2015.
The multiplicative weights update method: a meta-algorithm and applications, Theory of Computing, vol.8, pp.121-164, 2012. ,
The nonstochastic multiarmed bandit problem, SIAM journal on computing, vol.32, pp.48-77, 2002. ,
, Minimizing regret on reflexive banach spaces and learning nash equilibria in continuous zero-sum games, 2016.
, From predictive to prescriptive analytics, 2014.
Learning with minimal information in continuous games, 2018. ,
Evolutionary dynamics of multi-agent learning: a survey, Journal of Artificial Intelligence Research, vol.53, pp.659-697, 2015. ,
On-line algorithms in machine learning, Online algorithms, pp.306-325, 1998. ,
From external to internal regret, Journal of Machine Learning Research, vol.8, pp.1307-1324, 2007. ,
DOI : 10.1007/11503415_42
Bandit learning in concave N-person games, NIPS '18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01891523
Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and Trends R in Machine Learning, vol.5, pp.1-122, 2012. ,
Multi-agent reinforcement learning: An overview, in Innovations in multi-agent systems and applications-1, pp.183-221, 2010. ,
, Prediction, learning, and games, 2006.
Learning with bandit feedback in potential games, NIPS '17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01643352
, Berkeley Problems in Mathematics, 2012.
, Coordinated exploration in concurrent reinforcement learning, 2018.
of Economic learning and social evolution, The Theory of Learning in Games, vol.2, 1998. ,
Adaptive computation and machine learning, 2016. ,
Evaluating generalization in multiagent systems using agent-interaction graphs, International Conference on Autonomous Agents and Multiagent Systems, 2018. ,
Learning policy representations in multiagent systems, International Conference on Machine Learning, 2018. ,
, Best arm identification in multi-armed bandits with delayed feedback, 2018.
Introduction to Online Convex Optimization, Foundations and Trends(r) in Optimization Series, 2016. ,
DOI : 10.1561/2400000013
Logarithmic regret algorithms for online convex optimization, Machine Learning, vol.69, pp.169-192, 2007. ,
DOI : 10.1007/s10994-007-5016-8
Online learning under delayed feedback, International Conference on Machine Learning, pp.1453-1461, 2013. ,
Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.71, pp.291-307, 2005. ,
Convergence of heterogeneous distributed learning in stochastic routing games, Communication, Control, and Computing (Allerton), 2015 53rd Annual Allerton Conference on, pp.480-487, 2015. ,
On learning how players learn: estimation of learning dynamics in the routing game, Cyber-Physical Systems (ICCPS), 2016 ACM/IEEE 7th International Conference on, pp.1-10, 2016. ,
Network games: Theory, models, and dynamics, Synthesis Lectures on Communication Networks, vol.4, pp.1-159, 2011. ,
Distributed stochastic optimization via matrix exponential learning, IEEE Trans. Signal Process, vol.65, pp.2277-2290, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01382285
Cycles in adversarial regularized learning, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, pp.2703-2717, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01643338
Learning in games with continuous action sets and unknown payoff functions, Mathematical Programming, pp.1-43, 2018. ,
, Playing atari with deep reinforcement learning, 2013.
Limits and limitations of no-regret learning in games, The Knowledge Engineering Review, p.32, 2017. ,
Primal-dual subgradient methods for convex problems, Mathematical programming, vol.120, pp.221-259, 2009. ,
Multiplicative weights update with constant step-size in congestion games: Convergence, limit cycles and chaos, Advances in Neural Information Processing Systems, vol.30, pp.5872-5882, 2017. ,
Mixed-strategy learning with continuous action sets, IEEE Trans. Autom. Control, vol.62, pp.379-384, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01382280
Bandits with delayed, aggregated anonymous feedback, International Conference on Machine Learning, pp.4102-4110, 2018. ,
Online learning with adversarial delays, Advances in Neural Information Processing Systems, pp.1270-1278, 2015. ,
Selfish routing and the price of anarchy, vol.174 ,
Distributed nash equilibrium seeking via the alternating direction method of multipliers, IFAC-PapersOnLine, vol.50, pp.6166-6171, 2017. ,
, Online learning: Theory, algorithms, and applications, 2007.
Online learning and online convex optimization, Foundations and Trends R in Machine Learning, vol.4, pp.107-194, 2012. ,
, Advances in Neural Information Processing Systems, vol.19, pp.1265-1272, 2007.
, Multiagent systems: Algorithmic, game-theoretic, and logical foundations, 2008.
No-regret dynamics and fictitious play, Journal of Economic Theory, vol.148, pp.825-842, 2013. ,
No-regret dynamics and fictitious play, Journal of Economic Theory, vol.148, pp.825-842, 2013. ,
Information directed sequence understanding and chatbot design via recurrent neural networks, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.2131-2139, 2017. ,
The global anchor method for quantifying linguistic shifts and domain adaptation, Advances in Neural Information Processing Systems, 2018. ,
On the dimensionality of word embedding, Advances in Neural Information Processing Systems, 2018. ,
Offline multi-action policy learning: Generalization and optimization, 2018. ,
Dynamics on linear influence network games under stochastic environments, International Conference on Decision and Game Theory for Security, pp.114-126, 2016. ,
DOI : 10.1007/978-3-319-47413-7_7
Countering feedback delays in multi-agent learning, NIPS '17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01643350
Mirror descent learning in continuous games, Decision and Control (CDC), 2017 IEEE 56th Annual Conference on, pp.5776-5783, 2017. ,
DOI : 10.1109/cdc.2017.8264532
URL : https://hal.archives-ouvertes.fr/hal-01643341
A game-theoretical formulation of influence networks, American Control Conference (ACC), pp.3802-3807, 2016. ,
DOI : 10.1109/acc.2016.7525505
Distributed robust adaptive equilibrium computation for generalized convex games, Automatica, vol.63, pp.82-91, 2016. ,
Online convex programming and generalized infinitesimal gradient ascent, ICML '03: Proceedings of the 20th International Conference on Machine Learning, pp.928-936, 2003. ,