Skip to Main content Skip to Navigation
Conference papers

Long-Term Values in Markov Decision Processes, (Co)Algebraically

Abstract : This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.
Document type :
Conference papers
Complete list of metadata

Cited literature [28 references]  Display  Hide  Download

https://hal.inria.fr/hal-02044650
Contributor : Hal Ifip <>
Submitted on : Thursday, February 21, 2019 - 3:41:27 PM
Last modification on : Thursday, August 1, 2019 - 3:18:19 PM
Long-term archiving on: : Wednesday, May 22, 2019 - 4:14:16 PM

File

473364_1_En_6_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Frank Feys, Helle Hansen, Lawrence Moss. Long-Term Values in Markov Decision Processes, (Co)Algebraically. 14th International Workshop on Coalgebraic Methods in Computer Science (CMCS), Apr 2018, Thessaloniki, Greece. pp.78-99, ⟨10.1007/978-3-030-00389-0_6⟩. ⟨hal-02044650⟩

Share

Metrics

Record views

88

Files downloads

25