https://hal.inria.fr/hal-01590513Genuer, RobinRobinGenuerISPED - Institut de Santé Publique, d'Epidémiologie et de Développement - Université Bordeaux Segalen - Bordeaux 2Variance reduction in purely random forestsHAL CCSD2012Random ForestsNon-parametric regressionRates of convergenceRandomizationEnsemble methods *[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST]Genuer, Robin2017-09-19 16:34:242021-03-24 03:31:142017-09-19 17:19:12enJournal articleshttps://hal.inria.fr/hal-01590513/document10.1007/978-1-4899-0027-2application/pdf1Random forests, introduced by Leo Breiman in 2001, are a very effective statistical method. The complex mechanism of the method makes theoretical analysis difficult. Therefore, simplified versions of random forests, called purely random forests, which can be theoretically handled more easily, have been considered. In this paper we study the variance of such forests. First, we show a general upper bound which emphasizes the fact that a forest reduces the variance. We then introduce a simple variant of purely random forests, that we call purely uniformly random forests. For this variant and in the context of regression problems with a one-dimensional predictor space, we show that both random trees and random forests reach minimax rate of convergence. In addition, we prove that compared to random trees, random forests improve accuracy by reducing the estimator variance by a factor of three fourths.