Skip to Main content Skip to Navigation
Conference papers

Deep Reinforcement Learning Guided by a Library of Possibly Unreliable Advice

Nizam Makdoud 1, 2 Jérôme Kodjabachian 1 Marc Schoenauer 2
2 TAU - TAckling the Underspecified
Inria Saclay - Ile de France, LRI - Laboratoire de Recherche en Informatique
Abstract : Humans' impressive learning abilities are partly due to their capacity to reuse information from diverse sources. This competency is incredibly valuable for quickly mastering new tasks. Moreover, it is fundamental to overcome sample inefficiency of Reinforcement Learning. Nevertheless, without safeguards, following advice blindly may be detrimental to the learning process. Still, standard guidance schemes are poorly designed to asses the value of advice leading to weak guidance because valuable advice become indiscernible from detrimental one. We propose a novel transfer learning algorithm in which a library of policies potentially trained in different contexts advises a student learner. We provide evidence that the standard guidance algorithm which directly manipulates the student's policy is sensitive to follow sub-optimal advice. On the contrary, we propose to guide the student by maximizing the value function taken over a particular mixture of policies. The mixture of policies incorporate the knowledge from expert and rapidly provide fast learning. Our approach allows sample efficiency, even with sub-optimal advisors. It improves the overall performance of the algorithm concerning learning from scratch a task. We evaluate our approach on several control benchmarks.We provide strong empirical evidence that in multiple contexts that our approach provides exciting results.
Document type :
Conference papers
Complete list of metadata
Contributor : Marc Schoenauer Connect in order to contact the contributor
Submitted on : Thursday, February 18, 2021 - 6:40:19 PM
Last modification on : Friday, January 21, 2022 - 3:11:56 AM
Long-term archiving on: : Wednesday, May 19, 2021 - 7:40:15 PM


Files produced by the author(s)


  • HAL Id : hal-03146143, version 1


Nizam Makdoud, Jérôme Kodjabachian, Marc Schoenauer. Deep Reinforcement Learning Guided by a Library of Possibly Unreliable Advice. CAP'2020 - Conférence d'Apprentissage, AFIA, Jun 2020, Vannes, France. ⟨hal-03146143⟩



Les métriques sont temporairement indisponibles