N. Akchurina, Multiagent reinforcement learning: algorithm converging to nash equilibrium in general-sum discounted stochastic games, Proc. of AAMAS, 2009.

P. Dimitri, J. Bertsekas, and . Tsitsiklis, Neuro- Dynamic Programming, 1996.

G. Bourguin, A. Derycke, and J. Tarby, Beyond the Interface: Co-evolution Inside Interactive Systems ??? A Proposal Founded on Activity Theory, People and Computers XV-Interaction without Frontiers, pp.297-310, 2001.
DOI : 10.1007/978-1-4471-0353-0_18

M. Bowling and M. Veloso, Multiagent learning using a variable learning rate, Artificial Intelligence, vol.136, issue.2, pp.215-250, 2002.
DOI : 10.1016/S0004-3702(02)00121-2

L. Bus¸oniubus¸oniu, R. Babuska, and B. D. Schutter, A comprehensive survey of multiagent reinforcement learning. Systems, Man, and Cybernetics , Part C: Applications and Reviews, IEEE Transactions on, vol.38, issue.2, pp.156-172, 2008.

J. Caelen and A. Xuereb, Dialogue et théorie des jeux, Congrés international SPeD, 2011.

S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin, User Simulation in Dialogue Systems using Inverse Reinforcement Learning, Proc. of Interspeech, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00652446

S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin, Behavior Specific User Simulation in Spoken Dialogue Systems, Proc. of ITG Conference on Speech Com- munication, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00749421

S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin, Coadaptation in Spoken Dialogue Systems, Proc. of IWSDS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00778752

L. Daubigney, M. Geist, S. Chandramohan, and O. Pietquin, A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization, IEEE Journal of Selected Topics in Signal Processing, vol.6, issue.8, pp.891-902, 2012.
DOI : 10.1109/JSTSP.2012.2229257

I. Efstathiou and O. Lemon, Learning non-cooperative dialogue behaviours, Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014.
DOI : 10.3115/v1/W14-4308

L. El-asri, R. Laroche, and O. Pietquin, Dinasti : Dialogues with a negotiating appointment setting interface, Proc. of LREC, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01107496

S. Michael, P. A. English, and . Heeman, Learning mixed initiative dialog strategies by using reinforcement learning on both conversants, Proc. of HLT/EMNLP, 2005.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, pp.503-556, 2005.

J. Filar and K. Vrieze, Competitive Markov decision processes, 1996.
DOI : 10.1007/978-1-4612-4054-9

M. Geist, O. Pietquin, and G. Fricout, Tracking in Reinforcement Learning, Proc. of ICONIP, 2009.
DOI : 10.1007/978-3-642-10677-4_57
URL : https://hal.archives-ouvertes.fr/hal-00439316

K. Georgila, C. Nelson, and D. Traum, Single-Agent vs. Multi-Agent Techniques for Concurrent Reinforcement Learning of Negotiation Dialogue Policies, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014.
DOI : 10.3115/v1/P14-1047

G. J. Gordon, Approximate Solutions to Markov Decision Processes, 1999.

P. , J. Herings, and R. Peeters, Stationary equilibria in stochastic games: structure, selection and computation, 2000.

J. Hu and M. P. Wellman, Nash qlearning for general-sum stochastic games, Journal of Machine Learning Research, vol.4, pp.1039-1069, 2003.

R. Laroche, G. Putois, and P. Bretier, Optimising a handcrafted dialogue system design, Proc. of Interspeech, 2010.

O. Lemon and O. Pietquin, Machine learning for spoken dialogue systems, Proc. of Interspeech, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00216035

E. Levin and R. Pieraccini, A stochastic model of computer-human interaction for learning dialogue strategies, Proc. of Eurospeech, 1997.

L. Michael and . Littman, Markov games as a framework for multi-agent reinforcement learning, Proc. of ICML, 1994.

L. Michael and . Littman, Friend-or-foe q-learning in general-sum games, Proc. of ICML, 2001.

J. Milnor, Games against nature, 1951.

A. Neyman and S. Sorin, Stochastic games and applications, 2003.
DOI : 10.1007/978-94-010-0189-2

J. Martin, A. Osborne, and . Rubinstein, A course in game theory, 1994.

D. Stephen, D. P. Patek, and . Bertsekas, Stochastic shortest path games, SIAM Journal on Control and Optimization, vol.37, issue.3, 1999.

J. Perolat, B. Piot, B. Scherrer, and O. Pietquin, Approximate dynamic programming for two-player zero-sum markov games, Proc. of ICML, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01153270

O. Pietquin and H. Hastie, A survey on metrics for the evaluation of user simulations. The knowledge engineering review, pp.59-73, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00771654

O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-buet, Sampleefficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing, vol.7, issue.3, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00617517

O. Pietquin, Consistent goal-directed user model for realistic man-machine task-oriented spoken dialogue simulation, Proc of ICME, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00215968

H. L. Prasad, L. A. Prashanth, and S. Bhatnagar, Algorithms for nash equilibria in general-sum stochastic games, Proc. of AAMAS, 2015.

L. Martin and . Puterman, Markov decision processes: discrete stochastic dynamic programming, 1994.

J. Schatzmann, B. Thomson, K. Weilhammer, H. Ye, and S. Young, Agenda-based user simulation for bootstrapping a POMDP dialogue system, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers on XX, NAACL '07, 2007.
DOI : 10.3115/1614108.1614146

. Jost, M. Schatztnann, K. Stuttle, S. Weilhammer, and . Young, Effects of the user model on simulation-based learning of dialogue strategies, Proc. of ASRU, 2005.

L. Shapley, Stochastic games, Proc. of the National Academy of Sciences of the United States of America, pp.1095-1100, 1953.

P. Satinder, M. J. Singh, D. J. Kearns, M. A. Litman, and . Walker, Reinforcement learning for spoken dialogue systems, Proc. of NIPS, 1999.

S. Richard, A. G. Sutton, and . Barto, Reinforcement learning: An introduction, 1998.

S. Young, M. Gasic, B. Thomson, and J. D. Williams, POMDP-Based Statistical Spoken Dialog Systems: A Review, Proceedings of the IEEE, pp.1160-1179, 2013.
DOI : 10.1109/JPROC.2012.2225812

M. Zinkevich, A. Greenwald, and M. L. Littman, Cyclic equilibria in markov games, Proc. of NIPS, 2006.