, C(?, w(?)) ? C(?), D(?, w(?)) ? D(?)

Then the stationary policy w ? * given in eq. (4) is optimal for CMDP. Proof. (i) From Lemma 1 (i) it follows that either C(?, u) is infinite, or f (?, u) satisfies (1) and hence is in Q(?). in both cases (3) holds. (ii) follpws from Lemma 1 (ii). (iii) We may assume that there exists a, The optimal value C(?) of CMDP is equal to the optimal value C * of LP(?) ,

Constrained Markov Decision Processes, 1999. ,

URL : https://hal.archives-ouvertes.fr/inria-00074109