Andras Antos, Csaba Szepesvari, Rémi Munos. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory.
IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, Hawai, United States. pp.2007, 2007.
〈inria-00124833〉