Variance reduction in purely random forests

Robin Genuer 1
1 SISTM - Statistics In System biology and Translational Medicine
Epidémiologie et Biostatistique [Bordeaux], Inria Bordeaux - Sud-Ouest
Abstract : Random forests, introduced by Leo Breiman in 2001, are a very effective statistical method. The complex mechanism of the method makes theoretical analysis difficult. Therefore, simplified versions of random forests, called purely random forests, which can be theoretically handled more easily, have been considered. In this paper we study the variance of such forests. First, we show a general upper bound which emphasizes the fact that a forest reduces the variance. We then introduce a simple variant of purely random forests, that we call purely uniformly random forests. For this variant and in the context of regression problems with a one-dimensional predictor space, we show that both random trees and random forests reach minimax rate of convergence. In addition, we prove that compared to random trees, random forests improve accuracy by reducing the estimator variance by a factor of three fourths.
Document type :
Journal articles
Complete list of metadatas

Cited literature [10 references]  Display  Hide  Download

https://hal.inria.fr/hal-01590513
Contributor : Robin Genuer <>
Submitted on : Tuesday, September 19, 2017 - 4:34:24 PM
Last modification on : Tuesday, September 18, 2018 - 4:24:02 PM

File

genuer.var-reduc-prf-preprint....
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Collections

Citation

Robin Genuer. Variance reduction in purely random forests. Journal of Nonparametric Statistics, American Statistical Association, 2012, 2, pp.18 - 562. ⟨10.1007/978-1-4899-0027-2⟩. ⟨hal-01590513⟩

Share

Metrics

Record views

161

Files downloads

510