Actor-Critic Algorithms for Risk-Sensitive MDPs - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport Technique) Année : 2013

Actor-Critic Algorithms for Risk-Sensitive MDPs

Prashanth L.A.
  • Fonction : Auteur
  • PersonId : 937450
Mohammad Ghavamzadeh
  • Fonction : Auteur
  • PersonId : 868946

Résumé

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance-related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounted and average reward Markov decision processes. For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize. For each of these criteria, we derive a formula for computing its gradient. We then devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.

Domaines

Informatique
Fichier principal
Vignette du fichier
rs-rl-techreport.pdf (866.14 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00794721 , version 1 (27-02-2013)
hal-00794721 , version 2 (16-10-2013)

Identifiants

  • HAL Id : hal-00794721 , version 2

Citer

Prashanth L.A., Mohammad Ghavamzadeh. Actor-Critic Algorithms for Risk-Sensitive MDPs. [Technical Report] 2013. ⟨hal-00794721v2⟩
341 Consultations
707 Téléchargements

Partager

Gmail Facebook X LinkedIn More