Regret Minimization in MDPs with Options without Prior Knowledge

Ronan Fruit; Matteo Pirotta; Alessandro Lazaric; Emma Brunskill

Communication Dans Un Congrès Année : 2017

Regret Minimization in MDPs with Options without Prior Knowledge

(1) , (1) , (1) , (2)

1
2

Ronan Fruit

Fonction : Auteur

Sequential Learning

Matteo Pirotta

Fonction : Auteur
PersonId : 1023840

Sequential Learning

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Emma Brunskill

Fonction : Auteur

Computer Science Department - Carnegie Mellon University

Résumé

The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i.e., options). Recent works leveraged the mapping of Markov decision processes (MDPs) with options to semi-MDPs (SMDPs) and introduced SMDP-versions of exploration-exploitation algorithms (e.g., RMAX-SMDP and UCRL-SMDP) to analyze the impact of options on the learning performance. Nonetheless, the PAC-SMDP sample complexity of RMAX-SMDP can hardly be translated into equivalent PAC-MDP theoretical guarantees, while the regret analysis of UCRL-SMDP requires prior knowledge of the distributions of the cumulative reward and duration of each option, which are hardly available in practice. In this paper, we remove this limitation by combining the SMDP view together with the inner Markov structure of options into a novel algorithm whose regret performance matches UCRL-SMDP's up to an additive regret term. We show scenarios where this term is negligible and the advantage of temporal abstraction is preserved. We also report preliminary empirical results supporting the theoretical findings.

Domaines

Machine Learning [stat.ML]

Fichier principal

supplementary.pdf (1.67 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Alessandro Lazaric : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01649082

Soumis le : lundi 27 novembre 2017-11:05:38

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-01649082 , version 1 (27-11-2017)

Identifiants

HAL Id : hal-01649082 , version 1

Citer

Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill. Regret Minimization in MDPs with Options without Prior Knowledge. NIPS 2017 - Neural Information Processing Systems, Dec 2017, Long Beach, United States. pp.1-36. ⟨hal-01649082⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE ANR

370 Consultations

175 Téléchargements

Regret Minimization in MDPs with Options without Prior Knowledge

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager