P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

A. Auger, J. Bader, D. Brockhoff, and E. Zitzler, Theory of the hypervolume indicator, Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms, FOGA '09, pp.87-102, 2009.
DOI : 10.1145/1527125.1527138

URL : https://hal.archives-ouvertes.fr/inria-00430540

L. Barrett and S. Narayanan, Learning all optimal policies with multiple criteria, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.41-47, 2008.
DOI : 10.1145/1390156.1390162

V. Berthier, H. Doghmen, and O. Teytaud, Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search, LION4, pp.111-124, 2010.
DOI : 10.1007/978-3-642-13800-3_9

URL : https://hal.archives-ouvertes.fr/inria-00437146

N. Beume, B. Naujoks, and M. Emmerich, SMS-EMOA: Multiobjective selection based on dominated hypervolume, European Journal of Operational Research, vol.181, issue.3, pp.1653-1669, 2007.
DOI : 10.1016/j.ejor.2006.08.008

N. Beume, C. M. Fonseca, M. Lopez-ibanez, L. Paquete, and J. Vahrenhold, On the Complexity of Computing the Hypervolume Indicator, IEEE Transactions on Evolutionary Computation, vol.13, issue.5, pp.1075-1082, 2009.
DOI : 10.1109/TEVC.2009.2015575

K. Chatterjee, Markov Decision Processes with Multiple Long-Run Average Objectives, FSTTCS Foundations of Software Technology and Theoretical Computer Science, vol.4855, pp.473-484, 2007.
DOI : 10.1007/978-3-540-77050-3_39

P. Ciancarini and G. P. Favini, Monte-Carlo Tree Search techniques in the game of kriegspiel, IJCAI'09, pp.474-479, 2009.

R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proc. Computers and Games, pp.72-83, 2006.
DOI : 10.1007/978-3-540-75538-8_7

URL : https://hal.archives-ouvertes.fr/inria-00116992

K. Deb, Multi-objective optimization using evolutionary algorithms, pp.55-58, 2001.

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, A fast elitist non-dominated sorting genetic algorithm for multiobjective optimization: NSGA-II, PPSN VI, pp.849-858, 1917.

M. Fleischer, The Measure of Pareto Optima Applications to Multi-objective Metaheuristics, EMO'03, pp.519-533, 2003.
DOI : 10.1007/3-540-36970-8_37

T. Friedrich, K. Bringmann, T. Voß, and C. Igel, The logarithmic hypervolume indicator, Proceedings of the 11th workshop proceedings on Foundations of genetic algorithms, FOGA '11, pp.81-92, 2011.
DOI : 10.1145/1967654.1967662

Z. Gábor, Z. Kalmár, and C. Szepesvári, Multi-criteria reinforcement learning, ICML'98, pp.197-205, 1998.

R. Gaudel and M. Sebag, Feature selection as a one-player game, ICML'10, pp.359-366, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00484049

S. Gelly and D. Silver, Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007.
DOI : 10.1145/1273496.1273531

URL : https://hal.archives-ouvertes.fr/inria-00164003

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, pp.282-293, 2006.
DOI : 10.1007/11871842_29

S. Mannor and N. Shimkin, A geometric approach to multi-criterion reinforcement learning, Journal of Machine Learning Research, pp.325-360, 2004.

H. Nakhost and M. Müller, Monte-Carlo exploration for deterministic planning, IJCAI'09, pp.1766-1771, 2009.

S. Natarajan and P. Tadepalli, Dynamic preferences in multi-criteria reinforcement learning, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102427

C. H. Papadimitriou and M. Yannakakis, On the approximability of trade-offs and optimal access of Web sources, Proceedings 41st Annual Symposium on Foundations of Computer Science, pp.86-92, 2000.
DOI : 10.1109/SFCS.2000.892068

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

G. Tesauro, R. Das, H. Chan, J. Kephart, D. Levine et al., Managing power consumption and performance of computing systems using reinforcement learning, NIPS'07, pp.1-8, 2007.

J. D. Ullman, NP-complete scheduling problems, Journal of Computer and System Sciences, vol.10, issue.3, pp.384-393, 1975.
DOI : 10.1016/S0022-0000(75)80008-0

P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, Empirical evaluation methods for multiobjective reinforcement learning algorithms, Machine Learning, vol.7, issue.2, pp.51-80, 2010.
DOI : 10.1007/s10994-010-5232-5

Y. Wang and S. Gelly, Modifications of UCT and sequence-like simulations for Monte-Carlo Go, 2007 IEEE Symposium on Computational Intelligence and Games, pp.175-182, 2007.
DOI : 10.1109/CIG.2007.368095

Y. Wang, J. Audibert, and R. Munos, Algorithms for infinitely many-armed bandits, NIPS'08, pp.1-8, 2008.

J. Yu, R. Buyya, and K. Ramamohanarao, Workflow Scheduling Algorithms for Grid Computing, Studies in Computational Intelligence, vol.146, pp.173-214, 2008.
DOI : 10.1007/978-3-540-69277-5_7

E. Zitzler and L. Thiele, Multiobjective optimization using evolutionary algorithms ??? A comparative case study, PPSN V, pp.292-301, 1998.
DOI : 10.1007/BFb0056872