Regret Minimization in MDPs with Options without Prior Knowledge

Ronan Fruit 1 Matteo Pirotta 1 Alessandro Lazaric 1 Emma Brunskill 2
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i.e., options). Recent works leveraged the mapping of Markov decision processes (MDPs) with options to semi-MDPs (SMDPs) and introduced SMDP-versions of exploration-exploitation algorithms (e.g., RMAX-SMDP and UCRL-SMDP) to analyze the impact of options on the learning performance. Nonetheless, the PAC-SMDP sample complexity of RMAX-SMDP can hardly be translated into equivalent PAC-MDP theoretical guarantees, while the regret analysis of UCRL-SMDP requires prior knowledge of the distributions of the cumulative reward and duration of each option, which are hardly available in practice. In this paper, we remove this limitation by combining the SMDP view together with the inner Markov structure of options into a novel algorithm whose regret performance matches UCRL-SMDP's up to an additive regret term. We show scenarios where this term is negligible and the advantage of temporal abstraction is preserved. We also report preliminary empirical results supporting the theoretical findings.
Type de document :
Communication dans un congrès
NIPS 2017 - Neural Information Processing Systems, Dec 2017, Long Beach, United States. pp.1-36, 2017
Liste complète des métadonnées

Littérature citée [30 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01649082
Contributeur : Alessandro Lazaric <>
Soumis le : lundi 27 novembre 2017 - 11:05:38
Dernière modification le : mardi 3 juillet 2018 - 11:34:58

Fichier

supplementary.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01649082, version 1

Citation

Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill. Regret Minimization in MDPs with Options without Prior Knowledge. NIPS 2017 - Neural Information Processing Systems, Dec 2017, Long Beach, United States. pp.1-36, 2017. 〈hal-01649082〉

Partager

Métriques

Consultations de la notice

299

Téléchargements de fichiers

104