A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

François Dufour; Alexandre Genadot

doi:10.1137/19M1255811

Article Dans Une Revue SIAM Journal on Control and Optimization Année : 2020

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

(1, 2, 3) , (2, 3)

1
2
3

François Dufour

Fonction : Auteur
PersonId : 12044
IdHAL : francois-dufour
ORCID : 0000-0001-6653-2024
IdRef : 127261680

Institut Polytechnique de Bordeaux

Quality control and dynamic reliability

Institut de Mathématiques de Bordeaux

Alexandre Genadot

Fonction : Auteur
PersonId : 9101
IdHAL : alexandre-genadot
IdRef : 178361852

Quality control and dynamic reliability

Institut de Mathématiques de Bordeaux

Résumé

In this work, we study discrete-time Markov decision processes (MDPs) under constraints with Borel state and action spaces and where all the performance functions have the sameform of the expected total reward (ETR) criterion over the infinite time horizon. One of our objective is to propose a convex programming formulation for this type of MDPs. It will be shown that the values of the constrained control problem and thea ssociated convex program coincide and that if there exists an optimal solution to the convex program then there exists a stationary randomized policy which is optimal for the MDP. It will be also shown that in the framework of constrained control problems, the supremum of the expected total rewards over the set of randomized policies is equal to the supremum of the expected total rewards over the set of stationary randomized policies. We consider standard hypotheses such as the so-called continuity-compactness conditions and a Slater-type condition. Our assumptions are quite weak to deal with cases that have not yet been addressed in the literature. An example is presented to illustrate our results with respect to those of the literature.

Mots clés

Markov decision process Expected total reward criterion Occupation measure Constraints Convex program

Domaines

Optimisation et contrôle [math.OC]

François Dufour : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03033727

Soumis le : mardi 1 décembre 2020-14:55:08

Dernière modification le : jeudi 4 avril 2024-03:07:32

Dates et versions

hal-03033727 , version 1 (01-12-2020)

Identifiants

HAL Id : hal-03033727 , version 1
DOI : 10.1137/19M1255811

Citer

François Dufour, Alexandre Genadot. A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion. SIAM Journal on Control and Optimization, 2020, 58 (4), pp.2535-2566. ⟨10.1137/19M1255811⟩. ⟨hal-03033727⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA IMB INRIA2 TDS-MACS

25 Consultations

0 Téléchargements

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager