R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 1998.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015.

V. Mnih, Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, p.529, 2015.

, Asynchronous methods for deep reinforcement learning, ICML, 2016.

D. Silver, Mastering the game of go without human knowledge, Nature, vol.550, issue.7676, p.354, 2017.

N. Heess, Emergence of locomotion behaviours in rich environments, 2017.

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, IJRR, vol.37, issue.4-5, pp.421-436, 2018.

J. Mouret, Micro-data learning: The other end of the spectrum, ERCIM News, issue.107, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01374786

M. P. Deisenroth, D. Fox, and C. E. Rasmussen, Gaussian processes for data-efficient learning in robotics and control, IEEE Trans. Pattern Anal. Mach. Intell, vol.37, issue.2, pp.408-423, 2015.

M. P. Deisenroth, G. Neumann, and J. Peters, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, vol.2, issue.1, pp.1-142, 2013.

C. E. Garcia, D. M. Prett, and M. Morari, Model predictive control: theory and practice-a survey, Automatica, vol.25, pp.335-348, 1989.

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, IJRR, vol.32, issue.11, pp.1238-1274, 2013.

A. Y. Ng, Autonomous inverted helicopter flight via reinforcement learning, Experimental Robotics IX, pp.363-372, 2006.

J. Kober and J. Peters, Learning motor primitives for robotics, ICRA, 2009.

A. Cully, J. Clune, D. Tarapore, and J. Mouret, Robots that can adapt like animals, Nature, vol.521, issue.7553, pp.503-507, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01158243

R. Pautrat, K. Chatzilygeroudis, and J. Mouret, Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01768279

K. Chatzilygeroudis and J. Mouret, Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics, in ICRA, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01768285

A. J. Ijspeert, J. Nakanishi, and S. Schaal, Learning attractor landscapes for learning motor primitives, NIPS, 2003.

P. Abbeel, M. Quigley, and A. Y. Ng, Using inaccurate models in reinforcement learning, ICML, 2006.

E. Brochu, V. M. Cora, and N. Freitas, A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, 2010.

B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De-freitas, Taking the human out of the loop: A review of Bayesian optimization, Proceedings of the IEEE, vol.104, issue.1, pp.148-175, 2016.

R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, NIPS, 2000.

N. Kohl and P. Stone, Policy gradient reinforcement learning for fast quadrupedal locomotion, ICRA, 2004.

D. Silver, Deterministic policy gradient algorithms, ICML, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00938992

T. Degris, M. White, and R. S. Sutton, Linear off-policy actor-critic, ICML, 2012.

K. Ciosek and S. Whiteson, Expected Policy Gradients for Reinforcement Learning, 2018.

H. Van-seijen, H. Van-hasselt, S. Whiteson, and M. Wiering, A theoretical and empirical analysis of Expected Sarsa, ADPRL, 2009.

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, Trust region policy optimization, ICML, 2015.

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez et al., Continuous control with deep reinforcement learning, ICLR, 2016.

A. Abdolmaleki, R. Lioutikov, J. R. Peters, N. Lau, L. P. Reis et al., Model-based relative entropy stochastic search, NIPS, 2015.

P. Fidelman and P. Stone, Learning ball acquisition on a physical robot, ISRA, 2004.

F. Guenter, M. Hersch, S. Calinon, and A. Billard, Reinforcement learning for imitating constrained reaching movements, Advanced Robotics, vol.21, pp.1521-1544, 2007.

H. Van-hoof, T. Hermans, G. Neumann, and J. Peters, Learning robot in-hand manipulation with tactile features, Humanoids, pp.121-127, 2015.

T. Matsubara, S. Hyon, and J. Morimoto, Learning parametric dynamic movement primitives from multiple demonstrations, Neural Networks, vol.24, issue.5, pp.493-500, 2011.

S. M. Khansari-zadeh and A. Billard, Learning stable nonlinear dynamical systems with gaussian mixture models, IEEE Transactions on Robotics, vol.27, issue.5, pp.943-957, 2011.

A. Ude, B. Nemec, and J. Morimoto, Trajectory representation by nonlinear scaling of dynamic movement primitives, IROS, 2016.

A. Ude, A. Gams, T. Asfour, and J. Morimoto, Task-specific generalization of discrete and periodic dynamic movement primitives, IEEE Transactions on Robotics, vol.26, issue.5, pp.800-815, 2010.

J. Spitz, K. Bouyarmane, S. Ivaldi, and J. Mouret, Trial-and-Error Learning of Repulsors for Humanoid QP-based Whole-Body Control, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01569948

F. Stulp and O. Sigaud, Robot skill learning: From reinforcement learning to evolution strategies, Paladyn, Journal of Behavioral Robotics, vol.4, issue.1, pp.49-61, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922132

A. Ijspeert, J. Nakanishi, P. Pastor, H. Hoffmann, and S. Schaal, Dynamical Movement Primitives: Learning attractor models for motor behaviors, Neural Computation, vol.25, issue.2, pp.328-373, 2013.

A. J. Ijspeert, J. Nakanishi, and S. Schaal, Movement imitation with nonlinear dynamical systems in humanoid robots, 2002.

N. Roy and S. Thrun, Motion planning through policy search, IROS, 2002.

F. Stulp and O. Sigaud, Policy improvement: Between black-box optimization and episodic reinforcement learning, Journées Francophones Planification, Décision, et Apprentissage pour la conduite de systèmes, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922133

F. Stulp, E. Theodorou, and S. Schaal, Reinforcement learning with sequences of motion primitives for robust manipulation, IEEE Transactions on Robotics, vol.28, issue.6, pp.1360-1370, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00766177

F. Stulp and G. Raiola, Learning Compact Parameterized Skills with a Single Regression, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922135

A. Kupcsik, M. P. Deisenroth, J. Peters, A. P. Loh, P. Vadakkepat et al., Model-based contextual policy search for data-efficient generalization of robot skills, Artif. Intel, vol.247, pp.415-439, 2017.

A. Abdolmaleki, B. Price, N. Lau, L. P. Reis, and G. Neumann, Contextual covariance matrix adaptation evolutionary strategies, IJCAI, 2017.

S. Calinon, A tutorial on task-parameterized movement learning and retrieval, Intelligent Service Robotics, vol.9, issue.1, pp.1-29, 2016.

J. Buchli, F. Stulp, E. Theodorou, and S. Schaal, Learning Variable Impedance Control, IJRR, vol.30, issue.7, pp.820-833, 2011.

S. Calinon, D. Bruno, and D. G. Caldwell, A task-parameterized probabilistic model with minimal intervention control, 2014.

S. Calinon, P. Kormushev, and D. G. Caldwell, Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning, Robot. Auton. Syst, vol.61, pp.369-379, 2013.

M. Kalakrishnan, L. Righetti, P. Pastor, and S. Schaal, Learning force control policies for compliant manipulation, IROS, 2011.

K. Stanley and R. Miikkulainen, Evolving Neural Networks Through Augmenting Topologies, Evol. Comput, vol.10, pp.99-127, 2002.

K. Sims, Evolving Virtual Creatures, SIGGRAPH, 1994.

J. C. Bongard and R. Pfeifer, Evolving Complete Agents using Artificial Ontogeny, Proc. of Morpho-functional Machines: The New Species, 2003.

C. Daniel, G. Neumann, O. Kroemer, and J. Peters, Hierarchical relative entropy policy search, JMLR, pp.1-50, 2016.

M. R. Ryan and M. D. Pendrith, Rl-tops: An architecture for modularity and re-use in reinforcement learning, ICML, pp.481-487, 1998.

T. Lang, M. Toussaint, and K. Kersting, Exploration in relational domains for model-based reinforcement learning, J. Mach. Learn. Res, pp.3725-3768, 2012.

F. Yang, D. Lyu, B. Liu, and S. Gustafson, Peorl: Integrating symbolic planning and hierarchical reinforcement learning for robust decisionmaking, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, pp.4860-4866, 2018.

T. Osa, J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel et al., An algorithmic perspective on imitation learning, Foundations and Trends R in Robotics, vol.7, issue.1-2, pp.1-179, 2018.

B. D. Argall, S. Chernova, M. Veloso, and B. Browning, A survey of robot learning from demonstration, Robot. Auton. Syst, vol.57, pp.469-483, 2009.

A. Billard, S. Calinon, R. Dillmann, and S. Schaal, Robot programming by demonstration, Springer handbook of robotics, pp.1371-1394, 2008.

C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning, vol.1, 2006.

P. Hennig and C. J. Schuler, Entropy search for information-efficient global optimization, JMLR, vol.13, pp.1809-1837, 2012.

H. J. Kushner, A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise, J. Basic. Eng, vol.86, pp.97-106, 1964.

N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, Gaussian process optimization in the bandit setting: No regret and experimental design, 2009.

R. Calandra, A. Seyfarth, J. Peters, and M. Deisenroth, Bayesian optimization for learning gaits under uncertainty, Annals of Mathematics and Artificial Intelligence, 2015.

R. Martinez-cantin, N. Freitas, A. Doucet, and J. A. Castellanos, Active Policy Learning for Robot Planning and Exploration under Uncertainty, 2007.

D. J. Lizotte, T. Wang, M. H. Bowling, and D. Schuurmans, Automatic gait optimization with gaussian process regression, IJCAI, 2007.

J. Rieffel and J. Mouret, Adaptive and resilient soft tensegrity robots, Soft Robotics, vol.5, pp.318-329, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01800749

R. E. Bellman, Dynamic Programming, 1957.

Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. De-feitas, Bayesian optimization in a billion dimensions via random embeddings, JAIR, vol.55, pp.361-387, 2016.

K. Kandasamy, J. Schneider, and B. Póczos, High dimensional Bayesian optimisation and bandits via additive models, ICML, 2015.

P. Rolland, J. Scarlett, I. Bogunovic, and V. Cevher, High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups, 2018.

R. Akrour, D. Sorokin, J. Peters, and G. Neumann, Local bayesian optimization of motor skills, ICML, 2017.

J. Mouret and J. Clune, Illuminating search spaces by mapping elites, 2015.

V. Vassiliades, K. Chatzilygeroudis, and J. Mouret, Using centroidal Voronoi tessellations to scale up the multi-dimensional archive of phenotypic elites algorithm, IEEE Trans. Evol. Comput, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01630627

J. K. Pugh, L. B. Soros, and K. O. Stanley, Quality diversity: A new frontier for evolutionary computation, Frontiers in Robotics and AI, vol.3, p.40, 2016.

G. Lee, S. S. Srinivasa, and M. T. Mason, GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems, 2017.

K. Chatzilygeroudis, V. Vassiliades, and J. Mouret, Reset-free Trial-and-Error Learning for Robot Damage Recovery, Robot. Auton. Syst, vol.100, pp.236-250, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01654641

R. Antonova, A. Rai, and C. G. Atkeson, Sample efficient optimization for learning controllers for bipedal locomotion, 2016.

, Deep Kernels for Optimizing Locomotion Controllers, CoRL, 2017.

V. T. Inman and H. D. Eberhart, The major determinants in normal and pathological gait, JBJS, vol.35, issue.3, pp.543-558, 1953.

N. Hansen and A. Ostermeier, Completely derandomized selfadaptation in evolution strategies, Evol. Comput, vol.9, pp.159-195, 2001.

A. Wilson, A. Fern, and P. Tadepalli, Using trajectory data to improve bayesian optimization for reinforcement learning, JMLR, vol.15, issue.1, pp.253-282, 2014.

R. Lober, V. Padois, and O. Sigaud, Efficient reinforcement learning for humanoid whole-body control, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01377831

J. Salini, V. Padois, and P. Bidaud, Synthesis of complex humanoid whole-body behavior: a focus on sequencing and tasks transitions, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00578073

R. Lober, J. Eljaik, G. Nava, S. Dafarra, F. Romano et al., Optimizing task feasibility using model-free policy search and model-based whole-body control, ICRA, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01620370

A. Marco, F. Berkenkamp, P. Hennig, A. P. Schoellig, A. Krause et al., Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization, ICRA, 2017.

V. Papaspyros, K. Chatzilygeroudis, V. Vassiliades, and J. Mouret, Safety-Aware Robot Damage Recovery Using Constrained Bayesian Optimization and Simulated Priors, Proc. of the International Workshop "Bayesian Optimization: Black-box Optimization and Beyond" at NIPS, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01407757

J. R. Gardner, Bayesian Optimization with Inequality Constraints, ICML, 2014.

F. Berkenkamp, A. P. Schoellig, and A. Krause, Safe Controller Optimization for Quadrotors with Gaussian Processes, 2016.

A. S. Polydoros and L. Nalpantidis, Survey of Model-Based Reinforcement Learning: Applications on Robotics, Journal of Intelligent & Robotic Systems, pp.1-21, 2017.

R. S. Sutton, Dyna, an integrated architecture for learning, planning, and reacting, ACM SIGART Bulletin, vol.2, issue.4, pp.160-163, 1991.

L. P. Kaelbling, M. L. Littman, and A. W. Moore, Reinforcement learning: A survey, JAIR, vol.4, pp.237-285, 1996.

V. Tangkaratt, S. Mori, T. Zhao, J. Morimoto, and M. Sugiyama, Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation, Neural Networks, vol.57, pp.128-140, 2014.

S. Levine and P. , Learning neural network policies with guided policy search under unknown dynamics, NIPS, 2014.

P. Parmas, C. E. Rasmussen, J. Peters, and K. Doya, PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos, ICML, 2018.

K. Chatzilygeroudis, R. Rama, R. Kaushik, D. Goepp, V. Vassiliades et al., Black-Box Data-efficient Policy Search for Robotics, IROS, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01576683

M. P. Deisenroth and C. E. Rasmussen, PILCO: A model-based and data-efficient approach to policy search, ICML, 2011.

M. Sugiyama, I. Takeuchi, T. Suzuki, T. Kanamori, H. Hachiya et al., Least-squares conditional density estimation, IEICE Trans. on Information and Systems, vol.93, issue.3, pp.583-594, 2010.

S. Levine, C. Finn, T. Darrell, and P. Abbeel, End-to-end training of deep visuomotor policies, JMLR, vol.17, issue.39, pp.1-40, 2016.

V. Kumar, E. Todorov, and S. Levine, Optimal control with learned local models: Application to dexterous manipulation, 2016.

Y. Gal and Z. Ghahramani, Dropout as a Bayesian approximation: Representing model uncertainty in deep learning, ICML, 2016.

Y. Gal, R. T. Mcallister, and C. E. Rasmussen, Improving PILCO with Bayesian neural network dynamics models, Data-Efficient Machine Learning workshop, 2016.

J. C. Higuera, D. Meger, and G. Dudek, Synthesizing neural network controllers with probabilistic model based reinforcement learning, 2018.

K. Chua, R. Calandra, R. Mcallister, and S. Levine, Deep reinforcement learning in a handful of trials using probabilistic dynamics models, NIPS, 2018.

P. Wawrzynski, Learning to control a 6-degree-of-freedom walking robot, EUROCON, 2007.

S. Depeweg, J. M. Hernández-lobato, F. Doshi-velez, and S. Udluft, Learning and policy search in stochastic dynamical systems with bayesian neural networks, 2017.

, Decomposition of uncertainty in bayesian deep learning for efficient and risk-sensitive learning, ICML, 2018.

A. Doerr, Optimizing long-term predictions for model-based policy search, CoRL, 2017.

A. Y. Ng and M. Jordan, PEGASUS: a policy search method for large MDPs and POMDPs, 2000.

B. D. Anderson and J. B. Moore, Optimal filtering, Englewood Cliffs, vol.21, pp.22-95, 1979.

S. J. Julier and J. K. Uhlmann, Unscented filtering and nonlinear estimation, Proceedings of the IEEE, vol.92, pp.401-422, 2004.

M. P. Deisenroth, C. E. Rasmussen, and D. Fox, Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning, 2011.

M. P. Deisenroth, R. Calandra, A. Seyfarth, and J. Peters, Toward fast policy search for learning legged locomotion, IROS, 2012.

Y. Jin and J. Branke, Evolutionary optimization in uncertain environments -a survey, IEEE Trans. Evol. Comput, vol.9, pp.303-317, 2005.

B. L. Miller and D. E. Goldberg, Genetic algorithms, selection schemes, and the varying effects of noise, Evol. Comput, vol.4, pp.113-131, 1996.

S. Tsutsui and A. Ghosh, Genetic algorithms with a robust solution searching scheme, IEEE Trans. Evol. Comput, vol.1, pp.201-208, 1997.

N. Hansen, The CMA Evolution Strategy: A Comparing Review, 2006.

V. Heidrich-meisner and C. Igel, Hoeffding and bernstein races for selecting policies in evolutionary direct policy search, ICML, 2009.

N. Hansen, A. S. Niederberger, L. Guzzella, and P. Koumoutsakos, A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion, IEEE Trans. Evol. Comput, vol.13, pp.180-197, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00276216

N. Hansen, Benchmarking a BI-population CMA-ES on the BBOB-2009 noisy testbed, GECCO, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00382101

A. Auger and N. Hansen, A restart cma evolution strategy with increasing population size, CEC, pp.1769-1776, 2005.

B. Bischoff, D. Nguyen-tuong, H. Van-hoof, A. Mchutchon, C. E. Rasmussen et al., Policy search for learning robot control using sparse data, ICRA, 2014.

M. P. Deisenroth, P. Englert, J. Peters, and D. Fox, Multi-task policy search for robotics, 2014.

M. Cutler and J. P. How, Efficient reinforcement learning for robots using informative simulated priors, 2015.

M. Saveriano, Y. Yin, P. Falco, and D. Lee, Data-efficient control policy search using residual dynamics learning, IROS, 2017.

T. Wu and J. Movellan, Semi-parametric Gaussian process for robot system identification, IROS, 2012.

J. Ko, D. J. Klein, D. Fox, and D. Haehnel, Gaussian processes and reinforcement learning for identification and control of an autonomous blimp, 2007.

M. Spong and D. Block, The pendubot: A mechatronic system for control research and education, Proc IEEE Conf Decis Control, 1995.

S. Zhu, A. Kimmel, K. E. Bekris, and A. Boularias, Fast Model Identification via Physics Engines for Data-Efficient Policy Search, in IJCAI, 2018.

J. Bongard, V. Zykov, and H. Lipson, Resilient machines through continuous self-modeling, Science, vol.314, pp.1118-1121, 2006.

B. Siciliano and O. Khatib, Springer handbook of robotics, 2016.

W. Montgomery, A. Ajay, C. Finn, P. Abbeel, and S. Levine, Resetfree guided policy search: efficient deep reinforcement learning with stochastic initial states, 2017.

S. Levine and V. Koltun, Guided policy search, ICML, 2013.

S. Koos, J. Mouret, and S. Doncieux, The transferability approach: Crossing the reality gap in evolutionary robotics, IEEE Trans. Evol. Comput, vol.17, pp.122-145, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00687617

S. Koos, A. Cully, and J. Mouret, Fast damage recovery in robotics with the t-resilience algorithm, IJRR, vol.32, pp.1700-1723, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00932862

S. Koos and J. Mouret, Online discovery of locomotion modes for wheel-legged hybrid robots: A transferability-based approach, CLAWAR, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00633930

F. Sadeghi and S. Levine, CAD2RL: Real single-image flight without a single real image, 2017.

S. James, A. J. Davison, and E. Johns, Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task, 2017.

S. James, M. Bloesch, and A. J. Davison, Task-Embedded Control Networks for Few-Shot Imitation Learning, 2018.

S. James, P. Wohlhart, M. Kalakrishnan, D. Kalashnikov, A. Irpan et al., Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks, CVPR, 2019.

Y. Chebotar, A. Handa, V. Makoviychuk, M. Macklin, J. Issac et al., Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience, 2018.

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai et al., Sim-to-Real: Learning Agile Locomotion For Quadruped Robots, 2018.

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, Sim-toreal transfer of robotic control with dynamics randomization, 2018.

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong et al., Hindsight experience replay, NIPS, 2017.

M. Feurer, J. T. Springenberg, and F. Hutter, Initializing bayesian hyperparameter optimization via meta-learning, 2015.

C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, ICML, 2017.

I. Clavera, A. Nagabandi, R. S. Fearing, P. Abbeel, S. Levine et al., Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning, ICLR, 2019.

S. Saemundsson, K. Hofmann, and M. P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, 2018.

J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis et al., Learning agile and dynamic motor skills for legged robots, Science Robotics, vol.4, issue.26, 2019.

N. G. Tsagarakis and G. Metta, icub: the design and realization of an open humanoid platform for cognitive and neuroscience research, Advanced Robotics, vol.21, issue.10, pp.1151-1175, 2007.

P. Maiolino, M. Maggiali, G. Cannata, G. Metta, and L. Natale, A flexible and robust large scale capacitive tactile system for robots, IEEE Sensors Journal, vol.13, issue.10, pp.3910-3917, 2013.

T. Dean and K. Kanazawa, A model for reasoning about persistence and causation, Comput. Intell, vol.5, pp.142-150, 1989.

C. Boutilier, R. Dearden, and M. Goldszmidt, Stochastic dynamic programming with factored representations, Artif. Intel, vol.121, pp.49-107, 2000.

A. J. Ijspeert, Central pattern generators for locomotion control in animals and robots: a review, Neural Netw, vol.21, pp.642-653, 2008.

V. C. Kumar, S. Ha, and K. Yamane, Improving Model-Based Balance Controllers using Reinforcement Learning and Adaptive Sampling, in ICRA, 2018.

R. Jonschkowski and O. Brock, Learning state representations with robotic priors, Autonomous Robots, vol.39, issue.3, pp.407-428, 2015.

T. Lesort, N. Díaz-rodríguez, J. Goudou, and D. Filliat, State representation learning for control: An overview, Neural Netw, vol.108, pp.379-392, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01858558

J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. Singh, Action-conditional video prediction using deep networks in atari games, NIPS, 2015.

D. Ha and J. Schmidhuber, World models, 2018.

J. M. Assael, N. Wahlström, T. B. Schön, and M. P. Deisenroth, Data-efficient learning of feedback policies from image pixels using deep dynamical models, NIPS Deep RL Workshop, 2015.

S. A. Eslami, D. J. Rezende, F. Besse, F. Viola, A. S. Morcos et al., Neural scene representation and rendering, Science, vol.360, issue.6394, pp.1204-1210, 2018.

J. A. Musick and C. J. Limpus, Habitat utilization and migration in juvenile sea turtles, The biology of sea turtles, vol.1, pp.137-163, 1997.

T. Lesort, M. Seurin, X. Li, N. D. Rodríguez, and D. Filliat, Unsupervised state representation learning with robotic priors: a robustness benchmark, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01644423

T. Pham, G. D. Magistris, and R. Tachibana, OptLayer-Practical Constrained Optimization for Deep Reinforcement Learning in the Real World, 2018.

T. Haarnoja, V. Pong, A. Zhou, M. Dalal, P. Abbeel et al., Composable Deep Reinforcement Learning for Robotic Manipulation, in ICRA, 2018.

T. Pinville, S. Koos, J. Mouret, and S. Doncieux, How to promote generalisation in evolutionary robotics: the progab approach, GECCO, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00633928

B. C. Silva, G. Konidaris, and A. G. Barto, Learning parameterized skills, ICML, 2012.

J. Kober, A. Wilhelm, E. Oztop, and J. Peters, Reinforcement learning to adjust parametrized motor primitives to new situations, Autonomous Robots, vol.33, issue.4, pp.361-379, 2012.

A. Fabisch and J. H. Metzen, Active contextual policy search, JMLR, vol.15, issue.1, pp.3371-3399, 2014.

T. Schaul, D. Horgan, K. Gregor, and D. Silver, Universal value function approximators, ICML, 2015.

P. Karkus, A. Kupcsik, D. Hsu, and W. S. Lee, Factored Contextual Policy Search with Bayesian Optimization, BayesOpt'16: Proceedings of the International Workshop "Bayesian Optimization: Black-box Optimization and Beyond" at NIPS, 2016.

S. Ha and C. K. Liu, Evolutionary optimization for parameterized whole-body dynamic motor skills, 2016.

Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta et al., Target-driven visual navigation in indoor scenes using deep reinforcement learning, 2017.

P. Rauber, A. Ummadisingu, F. Mutz, and J. Schmidhuber, Hindsight policy gradients, 2017.

D. Ghosh, A. Singh, A. Rajeswaran, V. Kumar, and S. Levine, Divideand-conquer reinforcement learning, in ICLR, 2018.

D. J. Mankowitz, A. ?ídek, A. Barreto, D. Horgan, M. Hessel et al., Unicorn: Continual learning with a universal, off-policy agent, 2018.

M. P. Deisenroth, P. Englert, J. Peters, and D. Fox, Multi-task policy search for robotics, 2014.

S. Paul, K. Chatzilygeroudis, K. Ciosek, J. Mouret, M. A. Osborne et al., Alternating Optimisation and Quadrature for Robust Control, AAAI, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01644063

V. Vassiliades and C. Christodoulou, Toward nonlinear local reinforcement learning rules through neuroevolution, Neural Computation, vol.25, issue.11, pp.3020-3043, 2013.

J. X. Wang, Z. Kurth-nelson, D. Tirumala, H. Soyer, J. Z. Leibo et al., Learning to reinforcement learn, 2016.

J. Harrison, A. Sharma, R. Calandra, and M. Pavone, Control Adaptation via Meta-Learning Dynamics, Workshop on Meta-Learning at NeurIPS, 2018.

W. Yu, C. K. Liu, and G. Turk, Preparing for the unknown: Learning a universal policy with online system identification, 2017.

A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, Epopt: Learning robust neural network policies using model ensembles, 2016.

R. E. Kalman, A new approach to linear filtering and prediction problems, J. Basic. Eng, vol.82, pp.35-45, 1960.

D. Mayne, A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems, International Journal of Control, vol.3, issue.1, pp.85-95, 1966.

D. H. Jacobson and D. Q. Mayne, Differential dynamic programming, 1970.

E. Todorov and W. Li, A generalized iterative lqg method for locallyoptimal feedback control of constrained nonlinear stochastic systems, American Control Conference, 2005.

J. Koenemann, A. Prete, Y. Tassa, E. Todorov, O. Stasse et al., Whole-body model-predictive control applied to the HRP-2 humanoid, IROS, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01137021

K. L. Moore, M. Dahleh, and S. Bhattacharyya, Iterative learning control: A survey and new results, Journal of Field Robotics, vol.9, issue.5, pp.563-594, 1992.

D. A. Bristow, M. Tharayil, and A. G. Alleyne, A survey of iterative learning control, IEEE Control Systems, vol.26, pp.96-114, 2006.

K. S. Lee, I. Chin, H. J. Lee, and J. H. Lee, Model predictive control technique combined with iterative learning for batch processes, AIChE Journal, vol.45, issue.10, pp.2175-2187, 1999.

J. H. Lee, K. S. Lee, and W. C. Kim, Model-based iterative learning control with a quadratic criterion for time-varying linear systems, Automatica, vol.36, issue.5, pp.641-657, 2000.

Y. Wang, D. Zhou, and F. Gao, Iterative learning model predictive control for multi-phase batch processes, Journal of Process Control, vol.18, issue.6, pp.543-557, 2008.

T. Zhang, G. Kahn, S. Levine, and P. Abbeel, Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search, 2016.

S. Karaman and E. Frazzoli, Sampling-based algorithms for optimal motion planning, IJRR, vol.30, issue.7, pp.846-894, 2011.

C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling et al., A survey of monte carlo tree search methods, IEEE Transactions on Computational Intelligence and AI in Games, vol.4, pp.1-43, 2012.

M. Duarte, J. Gomes, S. M. Oliveira, and A. L. Christensen, Evolution of repertoire-based control for robots with complex locomotor systems, IEEE Trans. Evol. Comput, 2017.

D. Clever, M. Harant, K. Mombaur, M. Naveau, O. Stasse et al., Cocomopl: A novel approach for humanoid walking generation combining optimal control, movement primitives and learning and its transfer to the real robot hrp-2, IEEE Robotics and Automation Letters, vol.2, issue.2, pp.977-984, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01459840

, Konstantinos Chatzilygeroudis is currently a postdoctoral fellow at the LASA team at EPFL. He obtained a B.Sc. and M.Sc. in Computer Science and Engineering from the University of Patras in 2014, and a Ph.D. in Robotics and Machine Learning from Inria Nancy-Grand Est (France) and the University of Lorraine. His research interests lie in the area of artificial intelligence and focus on reinforcement learning and fast robot adaptation

, He is currently a team leader at the Research Centre on Interactive Media, Smart Systems and Emerging Technologies (RISE) in Cyprus. He held post-doctoral and research engineer positions at Inria, and research associate positions at the University of Cyprus (2015-2019) and RISE, 2015.

, He is currently the head of the department of Cognitive Robotics at the Institute of Robotics and Mechatronics at the German Aerospace Center (DLR). Previously, he was an assistant professor at theÉcole Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech). He currently serves as an Associate Editor, IEEE Transactions on Robotics, 2007.

, He is a Senior Researcher at the Idiap Research Institute, and a Lecturer at the EPFL. From 2009 to 2014, he was a Team Leader at the Department of Advanced Robotics, Italian Institute of Technology, he was a Postdoc at EPFL. He currently serves as an Associate Editor in IEEE Transactions on Robotics and IEEE Robotics and Automation Letters, 2007.

. Jean-baptiste, Directeur de recherche") at Inria, the French research institute dedicated to computer science and mathematics; from 2009 to 2015, he was an assistant professor ("maître de conférences") at the Pierre and Marie Curie University. His work was recently featured on the cover of Nature (Cully et al., 2015) and it received several national and international scientific awards, including the "Prix La Recherche 2016" and the "Distinguished Young Investigator in Artificial Life, Mouret received the Ph.D. degree in 2008 from the Pierre and Marie Curie University, 2017.