L. Adolphs and T. Hofmann, Ledeepchef: Deep reinforcement learning agent for families of text-based games, 2019.

M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum et al., Safe reinforcement learning via shielding, Proc. of AAAI, 2018.

Y. Chandak, G. Theocharous, J. Kostas, S. Jordan, T. et al., Learning action representations for reinforcement learning, Proc. of ICML, 2019.

Y. Chen, Y. Chen, Y. Yang, Y. Li, J. Yin et al., Learning action-transferable policy with action embedding, 2019.

M. Chevalier-boisvert, D. Bahdanau, S. Lahlou, L. Willems, C. Saharia et al., BabyAI: First steps towards grounded language learning with a human in the loop, Proc. of ICLR, 2019.

M. Côté, A. Kádár, X. Yuan, B. Kybartas, T. Barnes et al., Textworld: A learning environment for text-based games, 2018.

G. Dulac-arnold, L. Denoyer, P. Preux, and P. Gallinari, Fast reinforcement learning with large action sets using error-correcting output codes for mdp factorization, Proc. of ECML and PKDD, pp.180-194, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00747729

G. Dulac-arnold, R. Evans, H. Van-hasselt, P. Sunehag, T. Lillicrap et al., Deep reinforcement learning in large discrete action spaces, 2015.

S. El-tantawy, B. Abdulhai, A. , and H. , Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto, Proc. of TITS, 2013.

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems, Journal of machine learning research, vol.7, pp.1079-1105, 2006.

H. V. Hasselt, A. Guez, and D. Silver, Deep reinforcement learning with double q-learning, Proc. of AAAI, 2016.

J. He, J. Chen, X. He, J. Gao, L. Li et al., Deep reinforcement learning with a natural language action space, Proc. of ACL, pp.1621-1630, 2016.

J. He, M. Ostendorf, X. He, J. Chen, J. Gao et al., Deep reinforcement learning with a combinatorial action space for predicting popular reddit threads, Proc. of EMNLP, pp.1838-1848, 2016.

T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul et al., Deep q-learning from demonstrations, Proc. of AAAI, 2018.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in applied mathematics, vol.6, issue.1, pp.4-22, 1985.

T. Lattimore and C. Szepesvári, , 2018.

N. Lazic, C. Boutilier, T. Lu, E. Wong, B. Roy et al., Data center cooling using model-predictive control, Proc. of NeurIPS, 2018.

Y. Lecun and Y. Bengio, Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 1995.

H. Mao, M. Alizadeh, I. Menache, and S. Kandula, Resource management with deep reinforcement learning, Proc. of ACM Workshop on Hot Topics in Networks, 2016.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, p.529, 2015.

A. Y. Ng, D. Harada, R. , and S. , Policy invariance under reward transformations: Theory and application to reward shaping, Proc. of ICML, 1999.

L. Orseau and S. Armstrong, Safely interruptible agents, Proc. of UAI, 2016.

B. Piot, M. Geist, and O. Pietquin, Boosted Bellman residual minimization handling expert demonstrations, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, issue.2, pp.549-564, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01060953

T. Pohlen, B. Piot, T. Hester, M. G. Azar, D. Horgan et al., Observe and look further: Achieving consistent performance on atari, 2018.

M. L. Puterman, Markov Decision Processes.: Discrete Stochastic Dynamic Programming, 2014.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai et al., A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science, vol.362, issue.6419, pp.1140-1144, 2018.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 2018.

G. Tennenholtz and S. Mannor, The natural language of actions, Proc. of ICML, 2019.

T. Zahavy, M. Haroush, N. Merlis, D. J. Mankowitz, and S. Mannor, Learn what not to learn: Action elimination with deep reinforcement learning, Proc. of NeurIPS, 2018.

Z. Zhou, X. Li, and R. N. Zare, Optimizing chemical reactions with deep reinforcement learning, ACS central science, vol.3, issue.12, pp.1337-1344, 2017.