T. Bartz-beielstein, W. G. Christian, M. Lasarczyk, and . Preuss, Sequential parameter optimization, IEEE Congress on Evolutionary Computation, 2005.

F. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic, Probabilistic Integration: A Role for Statisticians in Numerical Analysis? ArXiv e-prints, 2015.

E. Brochu, M. Vlad, N. Cora, and . De-freitas, A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, 2010.

R. Calandra, A. Seyfarth, J. Peters, and M. P. Deisenroth, Bayesian optimization for learning gaits under uncertainty, Annals of Mathematics and Artificial Intelligence, 2015.

K. Chatzilygeroudis and J. Mouret, Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics, International Conference on Robotics and Automation (ICRA), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01768285

K. Chatzilygeroudis, R. Rama, R. Kaushik, D. Goepp, V. Vassiliades et al., Black-Box Data-efficient Policy Search for Robotics, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01576683

K. Chatzilygeroudis, V. Vassiliades, F. Stulp, J. Sylvain-calinon, and . Mouret, A survey on policy search algorithms for learning robot controllers in a handful of trials, IEEE Transactions on Robotics, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02393432

Y. Chebotar, M. Kalakrishnan, A. Yahya, A. Li, S. Schaal et al., Path integral guided policy search, IEEE International Conference on Robotics and Automation (ICRA), 2017.

K. Ciosek and S. Whiteson, Offer: Off-environment reinforcement learning, AAAI Conference on Artificial Intelligence, 2017.

D. Dennis, S. Cox, and . John, A statistical method for global optimization, IEEE International Conference on Systems, Man and Cybernetics, 1992.

D. Dennis, S. Cox, and . John, Sdo: A statistical method for global optimization, Multidisciplinary Design Optimization: State-of-the-Art, 1997.

A. Cully and J. Mouret, Evolving a behavioral repertoire for a walking robot, Evolutionary Computation, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01095543

A. Cully, J. Clune, D. Tarapore, and J. Mouret, Robots that can adapt like animals, Nature, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01158243

M. Peter-deisenroth and C. E. Rasmussen, Pilco: A model-based and data-efficient approach to policy search, International Conference on Machine Learning (ICML), 2011.

J. Frank, S. Mannor, and D. Precup, Reinforcement learning in the presence of rare events, International Conference on Machine Learning (ICML), 2008.

P. W. Glynn, Likelihood ratio gradient estimation for stochastic systems, Communications of the ACM, 1990.

T. Gunter, M. A. Osborne, R. Garnett, P. Hennig, and S. Roberts, Sampling for inference in probabilistic models with fast bayesian quadrature, Neural Information Processing Systems (NIPS), 2014.

P. Hennig, M. A. Osborne, and M. Girolami, Probabilistic numerics and uncertainty in computations, Proceedings of the Royal Society of London A: Mathematical, Physical and Robust RL with Bayesian Optimisation & Quadrature Engineering Sciences, 2015.

F. Hutter, H. Holger, K. Hoos, K. P. Leyton-brown, and . Murphy, An experimental investigation of model-based parameter optimisation: Spo and beyond, Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, 2009.

N. Jakobi, Evolutionary robotics and the radical envelope-of-noise hypothesis, Adaptive Behavior, 1997.

N. Jakobi, P. Husbands, and I. Harvey, Noise and the reality gap: The use of simulation in evolutionary robotics, Advances in Artificial Life, 1995.

D. R. Jones, C. D. Perttunen, and B. E. Stuckman, Lipschitzian optimization without the lipschitz constant, Journal of Optimization Theory and Applications, 1993.

D. R. Jones, M. Schonlau, and W. Welch, Efficient global optimization of expensive black-box functions, Journal of Global Optimization, 1998.

S. Kamthe and M. Deisenroth, Data-efficient reinforcement learning with probabilistic model predictive control, International Conference on Artificial Intelligence and Statistics, pp.1701-1710, 2018.

M. Kanagawa, K. Bharath, K. Sriperumbudur, and . Fukumizu, Convergence guarantees for kernel-based quadrature rules in misspecified settings, Neural Information Processing Systems (NIPS), 2016.

J. Sylvain-koos, S. Mouret, and . Doncieux, The transferability approach: Crossing the reality gap in evolutionary robotics, IEEE Transactions on Evolutionary Computation, 2013.

A. Krause, S. Cheng, and . Ong, Contextual gaussian process bandit optimization, Neural Information Processing Systems (NIPS), 2011.

J. Lee, DART: Dynamic Animation and Robotics Toolkit, The Journal of Open Source Software, 2018.

S. Levine and V. Koltun, Guided policy search, International Conference on International Conference on Machine Learning (ICML), 2013.

S. Levine, C. Finn, T. Darrell, and P. Abbeel, End-to-end training of deep visuomotor policies, Journal of Machine Learning Research, 2016.

H. Lipson and J. B. Pollack, Automatic design and manufacture of robotic lifeforms, Nature, 2000.

D. J. Lizotte, T. Wang, M. Bowling, and D. Schuurmans, Automatic gait optimization with gaussian process regression, International Joint Conference on Artificial Intelligence (IJCAI), 2007.

A. Marco, F. Berkenkamp, P. Hennig, A. P. Schoellig, A. Krause et al., Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization, International Conference on Robotics and Automation (ICRA), 2017.

R. Martinez-cantin, A. Nando-de-freitas, J. Doucet, and . Castellanos, Active policy learning for robot planning and exploration under uncertainty, Robotics: Science and Systems, 2007.

R. Martinez-cantin, E. Nando-de-freitas, J. Brochu, A. Castellanos, and . Doucet, A bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot, Autonomous Robots, 2009.

J. Mo?kus, On bayesian methods for seeking the extremum, Optimization Techniques IFIP Technical Conference, 1975.

J. Mouret and J. Clune, Illuminating search spaces by mapping elites, 2015.

R. Neal, Slice sampling, Annals of Statistics, 2000.

O. Anthony and . Hagan, Monte carlo is fundamentally unsound, Journal of the Royal Statistical Society. Series D, 1987.

O. Anthony and . Hagan, Bayes-hermite quadrature, Journal of Statistical Planning and Inference, 1991.

M. Osborne, R. Garnett, Z. Ghahramani, K. David, . Duvenaud et al., Active learning of model evidence using bayesian quadrature, Neural Information Processing Systems (NIPS), 2012.

S. Paul, K. Chatzilygeroudis, K. Ciosek, J. Mouret, M. Osborne et al., Alternating optimisation and quadrature for robust control, AAAI Conference on Artificial Intelligence, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01644063

S. Paul, M. A. Osborne, and S. Whiteson, Fingerprint policy optimisation for robust reinforcement learning, International Conference on Machine Learning (ICML), 2019.

R. Pautrat, K. Chatzilygeroudis, and J. Mouret, Bayesian optimization with automatic prior selection for data-efficient direct policy search, Proceedings 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01768279

J. Peters and S. Schaal, Policy gradient methods for robotics, IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006.

L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, Robust adversarial reinforcement learning, 2017.

M. Poloczek, J. Wang, and P. Frazier, Multi-information source optimization, Neural Information Processing Systems (NIPS), 2017.

A. Rajeswaran, S. Ghotra, S. Levine, B. Ravindran, and . Epopt, Learning robust neural network policies using model ensembles. International Conference on Learning Representations (ICLR, 2017.

C. Edward-rasmussen and Z. Ghahramani, Bayesian monte carlo, Neural Information Processing Systems (NIPS), 2003.

C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning), 2005.

D. B. Rubin, Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 1984.

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, Trust region policy optimization, International Conference on Machine Learning (ICML), 2015.

B. Settles, Active learning literature survey, 2010.

J. Snoek, K. Swersky, R. Zemel, and R. Adams, Input warping for bayesian optimization of non-stationary functions, International Conference on International Conference on Machine Learning (ICML), 2014.

N. Srinivas, A. Krause, M. Sham, M. Kakade, and . Seeger, Gaussian process optimization in the bandit setting: no regret and experimental design, International Conference on Machine Learning (ICML), 2010.

S. Tavaré, D. J. Balding, R. C. Griffiths, and P. Donnelly, Inferring coalescence times from dna sequence data, Genetics, 1997.

S. Toscano-palmerin and P. I. Frazier, Bayesian Optimization with Expensive Integrands, 2018.

B. J. Williams, T. J. Santner, and W. I. Notz, Sequential design of computer experiments to minimize integrated response functions, Statistica Sinica, 2000.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 1992.

A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, and S. Levine, Collective robot reinforcement learning with distributed asynchronous guided policy search, International Conference on Intelligent Robots and Systems (IROS), 2017.