Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004. ,
DOI : 10.1145/1015330.1015430
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.92
Preference-Based Policy Learning, numéro 6911 de LNCS, p.1227, 2011. ,
DOI : 10.1007/978-3-642-23780-5_11
URL : https://hal.archives-ouvertes.fr/inria-00625001
APRIL: Active Preference Learning-Based Reinforcement Learning, ECML/PKDD, pp.116131-75, 2012. ,
DOI : 10.1007/978-3-642-33486-3_8
URL : https://hal.archives-ouvertes.fr/hal-00722744
Programming by Feedback, Int. Conf. on Machine Learning (ICML), ACM Int. Conf. Proc. Series, p.2014 ,
URL : https://hal.archives-ouvertes.fr/hal-00980839
András Antos, Rémi Munos and Csaba Szepesvári. Fitted Qiteration in continuous action-space MDPs, 2007. ,
András Antos, Csaba Szepesvári and Rémi Munos. Learning near-optimal policies with Bellman-residual minimization based tted policy Bibliography iteration and a single sample path, Machine Learning, p.89129, 2008. ,
Learning tasks from a single demonstration, Proceedings of International Conference on Robotics and Automation, p.17061712, 1997. ,
DOI : 10.1109/ROBOT.1997.614389
Robot Learning From Demonstration, Proceedings of the Fourteenth International Conference on Machine Learning, ICML '97, pp.1220-1245, 1997. ,
Peter Auer, Nicolo Cesa-Bianchi and Paul Fischer. Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, vol.47, p.235256, 2002. ,
Convergence Results for the (1,?)-SA-ES using the Theory of ?-irreducible Markov Chains, Theoretical Computer Science, vol.334, issue.64, p.3569, 2005. ,
A framework for behavioural cloning, Machine Intelligence, pp.103129-103154, 1995. ,
Machine learning with structured outputs, 2006. ,
Applied dynamic programming, 1962. ,
DOI : 10.1515/9781400874651
Dynamic programming, 1957. ,
Multiple instance ranking, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.48-55, 2008. ,
DOI : 10.1145/1390156.1390163
Robot learning by demonstration, p.3824, 2013. ,
Automatically Mapped Transfer between Reinforcement Learning Tasks via Three- Way Restricted Boltzmann Machines, Lecture Notes in Computer Science, vol.8189, issue.2, pp.449464-2013, 2013. ,
Linear least-squares algorithms for temporal dierence learning, Machine Learning, p.2233, 1996. ,
Eric Brochu, Nando de Freitas and Abhijeet Ghosh Active Preference Learning with Discrete Choice Data, NIPS, p.43, 0118. ,
Active Preference Learning with Discrete Choice Data, Proc. NIPS, pp.409416-63, 2008. ,
A Bayesian Interactive Optimization Approach to Procedural Animation Design, Z. Popovic and M. A. Otaduy, editeurs, Symposium on Computer Animation, pages 103 112. Eurographics Association, 2010. ,
An Enquiry Into the Method of Paired Comparison-Reliability, Scaling, and Thurstones Law of Comparative Judgment, Gen Tech. Rep. United States Department of Agriculture, 2009. ,
PAC-inspired Option Discovery in Lifelong Reinforcement Learning, Proc. ICML 2014 JMLR Proceedings, pages 19. JMLR.org, p.2014, 2014. ,
Advances in neural information processing systems 26, 27th annual conference on neural information processing systems 2013. proceedings of a meeting held december 5-8, 2013, lake tahoe, nevada, united states, pp.2013-134 ,
Designing robot learners that ask good questions, Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, HRI '12, pp.1724-1751 ,
DOI : 10.1145/2157689.2157693
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.650.7240
Extensions of gaussian processes for ranking: semi-supervised and active learning, NIPS Workshop on Learning to Rank, p.40, 2005. ,
Preference learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.137144-137185, 2005. ,
DOI : 10.1145/1102351.1102369
Learning for control from multiple demonstrations, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.144151-144176, 2008. ,
DOI : 10.1145/1390156.1390175
Corinna Cortes and Vladimir Vapnik. Support-Vector Networks, Machine Learning, p.273297, 1995. ,
Ofer Dekel, Shai Shalev-Shwartz and Yoram Singer. The Forgetron: A Kernel-Based Perceptron on a Budget, SIAM J. Comput, vol.37, p.13421372, 2008. ,
Pierre Delarboulas, Marc Schoenauer and Michèle Sebag. Open-Ended Evolutionary Robotics: An Information Theoretic Approach, Lecture Notes in Computer Science, vol.6238, issue.1, pp.334343-2010, 2010. ,
Pattern classication and scene analysis, pp.60-61, 1973. ,
Towards imitation-enhanced Reinforcement Learning in multi-agent systems, 2011 IEEE Symposium on Artificial Life (ALIFE), p.613, 2011. ,
DOI : 10.1109/ALIFE.2011.5954652
Embodied imitation-enhanced reinforcement learning in multi-agent systems, Adaptive Behavior, vol.3, issue.4 ,
DOI : 10.1162/1064546053278955
Iteratively Extending Time Horizon Reinforcement Learning, Proceedings of the 14th European Conference on Machine Learning, p.96107, 2003. ,
DOI : 10.1007/978-3-540-39857-8_11
URL : http://orbi.ulg.ac.be/jspui/handle/2268/9361
Damien Ernst, Pierre Geurts and Louis Wehenkel. Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, p.503556, 2005. ,
Regularized Policy Iteration, p.441448, 2008. ,
Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes, J. Artif. Int. Res, vol.25, issue.1, p.75118, 2006. ,
A Cautious Approach to Generalization in Reinforcement Learning, Proc, 2010. ,
Selective Sampling Using the Query by Committee Algorithm, Machine Learning, p.133168, 1997. ,
Combining Online and Oine Knowledge in UCT, International Conference of Machine Learning, 2007. ,
iLSTD: Eligibility Traces and Convergence Analysis, Advances in Neural Information Processing Systems 19, p.441448, 2007. ,
Odalric Maillard and Rémi Munos. LSTD with Random Projections, Advances in Neural Information Processing Systems 23, p.721729, 2010. ,
Policy Shaping: Integrating Human Feedback with Reinforcement Learning, Burges et al ,
Autonomous Self-Assembly in Swarm-Bots, IEEE Transactions on Robotics, vol.22, issue.6, p.11151130, 2006. ,
DOI : 10.1109/TRO.2006.882919
Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation, vol.9, issue.2, pp.159195-159215, 0110. ,
DOI : 10.1016/0004-3702(95)00124-7
Ralf Herbrich, Thore Graepel and Colin Campbell. Bayes Point Machines, Journal of Machine Learning Research, vol.1, pp.245-279, 2001. ,
Evolution strategies with subjective selection, 1996. ,
DOI : 10.1007/3-540-61723-X_966
RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control, 2012 IEEE International Conference on Robotics and Automation, pp.8590-2012 ,
DOI : 10.1109/ICRA.2012.6225072
Analysis of response time distributions in the study of cognitive processes., Journal of Experimental Psychology: Learning, Memory, and Cognition, vol.10, issue.4, p.598615, 1984. ,
DOI : 10.1037/0278-7393.10.4.598
Programming by optimization, Communications of the ACM, vol.55, issue.2, pp.7080-2012, 2012. ,
DOI : 10.1145/2076450.2076469
Label Ranking by Learning Pairwise Preferences, Artif. Intell, vol.172, issue.16-17, p.18971916, 2008. ,
Learning Trajectory Preferences for Manipulators via Iterative Improvement, Burges et al. [Burges et al. 2013], pp.575583-53 ,
Near-optimal Regret Bounds for Reinforcement Learning, J. Mach. Learn. Res, vol.11, p.15631600, 2010. ,
Efcient Global Optimization of Expensive Black-Box Functions, J. of Global Optimization, vol.13, issue.44, pp.455492-455535, 1998. ,
Lower Bounds for Reductions, Atomic Learning Workshop, 2006. ,
Reinforcement Learning, Anesthesia & Analgesia, vol.112, issue.2, p.237285, 1996. ,
DOI : 10.1213/ANE.0b013e31820334a7
Kakade and Ambuj Tewari On the Generalization Ability of Online Strongly Convex Programming Algorithms, p.801808, 2008. ,
Complex systems: Chaos and beyond, 2000. ,
DOI : 10.1007/978-3-642-56861-9
ActorCritic models of reinforcement learning in the basal ganglia: from natural to articial rats, Adaptive Behavior, vol.13, issue.2, p.131148, 2005. ,
Learning from Limited Demonstrations, Burges et al. [Burges et al. 2013], pp.28592867-28592894 ,
Interactively shaping agents via human reinforcement, Proceedings of the fifth international conference on Knowledge capture, K-CAP '09, pp.916-956, 2009. ,
DOI : 10.1145/1597735.1597738
Combining manual feedback with subsequent MDP reward signals for reinforcement learning, Wiebe van der Hoek, Gal A. Kaminka, Yves Lespérance, pp.512-2010 ,
Reinforcement learning from simultaneous human and MDP reward, IFAAMAS, vol.40, pp.475482-2012 ,
DOI : 10.1109/roman.2012.6343862
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.1705
Training a Robot via Human Feedback: A Case Study, Int. Conf. on Social Robotics, pp.460470-93, 2013. ,
DOI : 10.1007/978-3-319-02675-6_46
Levente Kocsis and Csaba Szepesvári, 2006. ,
Policy Iteration for Factored MDPs, Proceedings of the Sixteenth Conference on Uncertainty in Articial Intelligence (UAI-00, p.326334, 2000. ,
Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion, NIPS, pp.56-61, 2007. ,
Least-Squares Policy Iteration, Journal of Machine Learning Research (JMLR), vol.4, issue.101 106, pp.11071149-11071167, 2003. ,
Reinforcement Learning as Classication: Leveraging Modern Classiers, Proceedings of the Twentieth International Conference on Machine Learning, pp.431-447, 2003. ,
Autonomous reinforcement learning on raw visual input data in a real world application, The 2012 International Joint Conference on Neural Networks (IJCNN), pp.18-2012 ,
DOI : 10.1109/IJCNN.2012.6252823
O-Road Obstacle Avoidance through End-to-End Learning, NIPS -Advances in Neural Information Processing Systems 18, 2006. ,
Exploiting Open-Endedness to solve problems through the search for novelty, Proc. of the Eleventh International Conference on AI Life (AILife-08), pp.329336-68, 2008. ,
Feature Construction for Inverse Reinforcement Learning, NIPS 23, p.13421350, 2010. ,
Autonomous Exploration For Navigating In MDPs, Shie Mannor ,
Evolutionary Robotics for Legged Machines: From Simulation to Physical Reality, IAS, p.1118, 2006. ,
Predictive Representations of State, Neural Information Processing Systems, p.15551561, 2002. ,
Modeling and Optimization of Adaptive Foraging in Swarm Robotic Systems, The International Journal of Robotics Research, vol.29, issue.14, p.17431760, 2010. ,
DOI : 10.1177/0278364910375139
Locomotion control of quadruped robots based on CPG-inspired workspace trajectory generation, 2011. ,
Practical Bayesian Optimization, 2008. ,
Mind Model Seems Necessary for the Emergence of Communication, Neural Information Processing -Letters and Reviews, vol.11, issue.4-6, p.109121, 2007. ,
Individual choice behavior, 1959. ,
Aude Billard and Auke Ijspeert. Evolutionary Robotics A Children's Game, Proceedings of IEEE 5th International Conference on Evolutionary Computation, pp.154158-154197, 1998. ,
Toward O-Policy Learning Control with Function Approximation, Omnipress, p.719726, 2010. ,
Structured prediction with reinforcement learning, Machine Learning, p.271301, 2009. ,
DOI : 10.1007/s10994-009-5140-8
URL : https://hal.archives-ouvertes.fr/hal-01172474
Bellman Error Based Feature Generation using Random Projections on Sparse Spaces, Advances in Neural Information Processing Systems 26, pp.30303038-2013 ,
Ioannis Antonoglou, Daan Wierstra and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning, p.17, 1312. ,
Error Bounds for Approximate Policy Iteration, ICML, p.560567, 2003. ,
Algorithms for Inverse Reinforcement Learning, Proc. 17th ICML, pp.663670-663695, 2000. ,
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping, Ivan Bratko and Saso Dzeroski, p.278287, 1999. ,
The distributed co-evolution of an embodied simulator and controller for swarm robot behaviours, Proc. IROS, p.49955000, 2011. ,
A sensorimotor account of vision and visual consciousness, Behavioral and Brain Sciences, vol.24, issue.80, pp.939973-60, 2001. ,
Intrinsically Motivated Exploration for Developmental and Active Sensorimotor Learning, From Motor Learning to Interaction Learning in Robots, numéro 264 de Studies in Computational Intelligence, p.107 ,
DOI : 10.1007/978-3-642-05181-4_6
Reinforcement learning of motor skills with policy gradients, Neural Networks, vol.21, issue.4, pp.682697-58, 2008. ,
DOI : 10.1016/j.neunet.2008.02.003
TREATING EPILEPSY VIA ADAPTIVE NEUROSTIMULATION: A REINFORCEMENT LEARNING APPROACH, International Journal of Neural Systems, vol.19, issue.04, p.227240, 2009. ,
DOI : 10.1142/S0129065709001987
Networks for approximation and learning, Proceedings of the IEEE, vol.78, issue.9, p.14811497, 1990. ,
DOI : 10.1109/5.58326
ALVINN: An Autonomous Land Vehicle In a Neural Network, Advances in Neural Information Processing Systems 1, pp.25-26, 1989. ,
Markov decision processes: Discrete stochastic dynamic programming, 1994. ,
Jette Randløv and Preben Alstrøm Learning to drive a bicycle using reinforcement learn-ing and shaping, Proc. 15th Intl Conf. on Machine Learning, p.463471, 1998. ,
Ecient Learning of Sparse Representations with an Energy-Based Model, NIPS, p.11371144, 2006. ,
Maximum margin planning, ICML, p.729736, 2006. ,
Learning to Search: Structured Prediction Techniques for Imitation Learning, 2009. ,
How people treat computers, television, and new media like real people and places, 1996. ,
Robotic Grasping of Novel Objects using Vision, The International Journal of Robotics Research, vol.13, issue.3, 2008. ,
DOI : 10.1177/0278364907087172
Picbreeder: A Case Study in Collaborative Evolutionary Exploration of Design Space, Evolutionary Computation, vol.9, issue.4, p.373403, 2011. ,
DOI : 10.1109/MCG.1996.481558
Gaussian Process Regression: Active Data Selection and Test Point Rejection, IJCNN (3), p.241246, 2000. ,
DOI : 10.1007/978-3-642-59802-9_4
Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space, Psychometrika, vol.3, issue.4, p.325345, 1957. ,
DOI : 10.1007/BF02288967
Nonparametric statistics for the behavioral sciences. McGrawHill, Inc., second édition, pp.43-53, 1988. ,
Convergence Results for Single-Step On-Policy Reinforcement- Learning Algorithms, MACHINE LEARNING, p.287308, 1998. ,
How to Teach Animals, Scientic American, vol.185, p.2629, 1951. ,
Practical Bayesian Optimization of Machine Learning Algorithms, NIPS, pp.29602968-2012 ,
The Optimal Control of Partially Observable Markov Decision Processes, 1971. ,
A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks, Artificial Life, vol.21, issue.2, 2009. ,
DOI : 10.1109/5.784219
Energy-ecient indoor search by swarms of simulated ying robots without global information, Swarm Intelligence, vol.4, issue.2, p.117143, 2010. ,
PAC model-free reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.881888-56, 2006. ,
DOI : 10.1145/1143844.1143955
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.326
Freek Stulp and Olivier Sigaud Robot Skill Learning: From Reinforcement Learning to Evolution Strategies, Paladyn. Journal of Behavioral Robotics, vol.4, issue.1, p.4961, 2013. ,
Daisuke Nagao, Shigeki Sugano and Tetsuya Ogata. Interactive evolution of human-robot communication in real Bibliography world, Proc. IEEE/RSJ IROS'05, p.1438, 2005. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, vol.112, issue.1-2, p.181211, 1999. ,
DOI : 10.1016/S0004-3702(99)00052-1
Fast gradient-descent methods for temporal-dierence learning with linear function approximation, Andrea Pohoreckyj Danyluk, Léon Bottou and Michael L, 2009. ,
A Game-Theoretic Approach to Apprenticeship Learning, NIPS, 2007. ,
Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation, Proceedings of the IEEE, vol.89, issue.9, p.12751296, 2001. ,
DOI : 10.1109/5.949485
Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance, Proceedings of the 21st National Conference on Articial Intelligence, pp.1000-1005, 2006. ,
Animal Intelligence, Science, vol.8, issue.198, 1911. ,
DOI : 10.1126/science.8.198.520
Cooperative hole avoidance in a swarm-bot, Robotics and Autonomous Systems, vol.54, issue.2, p.97103, 2006. ,
DOI : 10.1016/j.robot.2005.09.018
Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann and Yasemin Altun. Large Margin Methods for Structured and Interdependent Output Variables, Journal of Machine Learning Research, vol.6, p.14531484, 2005. ,
Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks, Australasian Conference on Articial Intelligence, pp.340-349, 2009. ,
DOI : 10.1007/978-3-642-10439-8_35
Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets, Bibliography In NIPS, vol.41, issue.80, pp.23522360-77, 2010. ,
Numerical Solutions by the Continuation Method, SIAM Review, vol.15, issue.1, pp.89119-93, 1973. ,
DOI : 10.1137/1015003
Evolutionary Function Approximation for Reinforcement Learning, Journal of Machine Learning Research, vol.7, issue.877917, 2006. ,
Critical Factors in the Empirical Performance of Temporal Dierence and Evolutionary Methods for Reinforcement Learning, Journal of Autonomous Agents and Multi-Agent Systems, vol.21, issue.94, pp.127-87, 2010. ,
Critical factors in the empirical performance of temporal dierence and evolutionary methods for reinforcement learning, Autonomous Agents and Multi- Agent Systems, vol.21, issue.20, pp.135-154, 2010. ,
Reinforcement learning: State-of-the-art. Adaptation, Learning, and Optimization, p.2012 ,
DOI : 10.1007/978-3-642-27645-3
A Bayesian Approach for Policy Learning from Trajectory Preference Queries, NIPS, pp.11421150-111, 2012. ,
LeCun et al. Handwritten Digit Recognition with a Back- Propagation Network, p.396404, 1989. ,
Interactively optimizing information retrieval systems as a dueling bandits problem, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, p.151, 2009. ,
DOI : 10.1145/1553374.1553527
Reinforcement learning design for cancer clinical trials, pp.48-83, 2009. ,