Long-Term Values in Markov Decision Processes, (Co)Algebraically

Frank Feys; Helle Hvid Hansen; Lawrence S. Moss

doi:10.1007/978-3-030-00389-0_6

Communication Dans Un Congrès Année : 2018

Long-Term Values in Markov Decision Processes, (Co)Algebraically

(1) , (1) , (2)

1
2

Frank Feys

Fonction : Auteur
PersonId : 1043059

Department of Engineering Systems and Services [Delft ]

Helle Hvid Hansen

Fonction : Auteur
PersonId : 1043060

Department of Engineering Systems and Services [Delft ]

Lawrence S. Moss

Fonction : Auteur
PersonId : 1043061

Department of mathematics [Bloomington]

Résumé

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

Mots clés

Markov decision process Long-term value Discounted sum Coalgebra Algebra Corecursive algebra Fixpoint Metric space

Domaines

Informatique [cs]

Fichier principal

473364_1_En_6_Chapter.pdf (475.04 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02044650

Soumis le : jeudi 21 février 2019-15:41:27

Dernière modification le : mardi 26 mars 2024-17:44:13

Archivage à long terme le : mercredi 22 mai 2019-16:14:16

Dates et versions

hal-02044650 , version 1 (21-02-2019)

Licence

Paternité

Identifiants

HAL Id : hal-02044650 , version 1
DOI : 10.1007/978-3-030-00389-0_6

Citer

Frank Feys, Helle Hvid Hansen, Lawrence S. Moss. Long-Term Values in Markov Decision Processes, (Co)Algebraically. 14th International Workshop on Coalgebraic Methods in Computer Science (CMCS), Apr 2018, Thessaloniki, Greece. pp.78-99, ⟨10.1007/978-3-030-00389-0_6⟩. ⟨hal-02044650⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-TC1 IFIP-WG1-3 IFIP-CMCS IFIP-LNCS-11202

91 Consultations

194 Téléchargements

Long-Term Values in Markov Decision Processes, (Co)Algebraically

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager