R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, 2018.

C. H. Papadimitriou and J. N. Tsitsiklis, The complexity of markov decision processes, Mathematics of Operations Research, vol.12, issue.3, pp.441-450, 1987.

M. Abadi, TensorFlow: A System for Large-Scale Machine Learning, Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16). USENIX Association, 2016.

F. Gulyássy and B. Vithayathil, Kapazitätsplanung mit SAP ®, 2014.

J. T. Dickersbach, Supply Chain Management with APO: Structures, Modelling Approaches and Implementation Pecularities. 3 rd edn, 2009.

A. Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly Media, 2017.

B. Leukert, J. Müller, and M. Noga, Das intelligente Unternehmen: Maschinelles Lernen mit SAP zielgerichtet einsetzen, pp.51-62, 2019.

A. Kuhnle, M. Schaarschmidt, and K. Fricke, Tensorforce: a TensorFlow library for applied reinforcement learning, 2017.

M. Schaarschmidt, A. Kuhnle, B. Ellis, K. Fricke, F. Gessert et al., LIFT: Reinforcement Learning in Computer Systems by Learning from Demonstrations, 2018.

, TensorForce Documentation Release 0.3.3. media.readthedocs, 2018.

V. Mnih, Human-level control through deep reinforcement learning, Nature, vol.518, pp.529-533, 2015.

S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, Continuous Deep Q-Learning with Modelbased Acceleration, 2016.

H. Van-hasselt, A. Guez, and D. Silver, Deep Reinforcement Learning with Double Qlearning, 2015.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning, vol.8, issue.3-4, pp.229-256, 1992.

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, Trust region policy optimization, Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp.1889-1897, 2017.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal Policy Optimization Algorithms, 2017.

V. Mnih, Asynchronous Methods for Deep Reinforcement Learning, 2016.

T. Lillicrap, Continuous control with deep reinforcement learning, 2016.

W. Kanit, Visualizing Dataflow Graphs of Learning Models in TensorFlow, IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, vol.24, issue.1, pp.1-12, 2018.

, The Tensor Board repository on GitHub