Actor-Critic Algorithms for Risk-Sensitive MDPs

Prashanth L.A. 1 Mohammad Ghavamzadeh 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance-related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounted and average reward Markov decision processes. For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize. For each of these criteria, we derive a formula for computing its gradient. We then devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.
Type de document :
Rapport
[Technical Report] 2013
Liste complète des métadonnées

Littérature citée [38 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00794721
Contributeur : Mohammad Ghavamzadeh <>
Soumis le : mercredi 16 octobre 2013 - 04:44:35
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : vendredi 7 avril 2017 - 11:33:44

Fichier

rs-rl-techreport.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00794721, version 2

Collections

Citation

Prashanth L.A., Mohammad Ghavamzadeh. Actor-Critic Algorithms for Risk-Sensitive MDPs. [Technical Report] 2013. 〈hal-00794721v2〉

Partager

Métriques

Consultations de la notice

394

Téléchargements de fichiers

392