Boosted Fitted Q-Iteration - Archive ouverte HAL Access content directly
Conference Papers Year :

Boosted Fitted Q-Iteration

(1) , (2) , (1) , (1)
1
2

Abstract

This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems. B-FQI is an iterative off-line algorithm that, given a dataset of transitions, builds an approximation of the optimal action-value function by summing the approximations of the Bell-man residuals across all iterations. The advantage of such approach w.r.t. to other AVI methods is twofold: (1) while keeping the same function space at each iteration, B-FQI can represent more complex functions by considering an additive model; (2) since the Bellman residual decreases as the optimal value function is approached , regression problems become easier as iterations proceed. We study B-FQI both theoretically , providing also a finite-sample error upper bound for it, and empirically, by comparing its performance to the one of FQI in different domains and using different regression techniques.
Fichier principal
Vignette du fichier
tosatto17a.pdf (371.64 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01653332 , version 1 (01-12-2017)

Identifiers

  • HAL Id : hal-01653332 , version 1

Cite

Samuele Tosatto, Matteo Pirotta, Carlo d'Eramo, Marcello Restelli. Boosted Fitted Q-Iteration. 34th International Conference on Machine Learning (ICML), Aug 2017, Sydney, Australia. ⟨hal-01653332⟩
146 View
109 Download

Share

Gmail Facebook Twitter LinkedIn More