Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill; Omar D Domingues; Pierre Ménard; Rémi Munos; Michal Valko

Communication Dans Un Congrès Année : 2019

Planning in entropy-regularized Markov decision processes and games

(1, 2) , (1) , (1) , (2, 1) , (2, 1)

1
2

Jean-Bastien Grill

Fonction : Auteur
PersonId : 972490

Sequential Learning

DeepMind [Paris]

Omar D Domingues

Fonction : Auteur

Sequential Learning

Pierre Ménard

Fonction : Auteur
PersonId : 1022182

Sequential Learning

Rémi Munos

Fonction : Auteur
PersonId : 836863

DeepMind [Paris]

Sequential Learning

Michal Valko

Fonction : Auteur
PersonId : 284
IdHAL : michal
IdRef : 22360934X

DeepMind [Paris]

Sequential Learning

Résumé

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O(1/ε 4) for a desired accuracy ε, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Domaines

Machine Learning [stat.ML]

Fichier principal

smoothcruiser2019.pdf (555.21 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Michal Valko : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02387515

Soumis le : vendredi 29 novembre 2019-18:46:38

Dernière modification le : jeudi 1 février 2024-10:06:35

Dates et versions

hal-02387515 , version 1 (29-11-2019)

Identifiants

HAL Id : hal-02387515 , version 1

Citer

Jean-Bastien Grill, Omar D Domingues, Pierre Ménard, Rémi Munos, Michal Valko. Planning in entropy-regularized Markov decision processes and games. Neural Information Processing Systems, 2019, Vancouver, Canada. ⟨hal-02387515⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA CRISTAL INRIA2 CRISTAL-SEQUEL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UNIV-LILLE UR1-MATH-NUM

77 Consultations

422 Téléchargements

Planning in entropy-regularized Markov decision processes and games

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager