Skip to Main content Skip to Navigation
New interface
Conference papers

VSURF : un package R pour la sélection de variables à l'aide de forêts aléatoires

Abstract : Variable selection is a crucial issue in many applied classication and regression problems. It is of interest for statistical analysis as well as for modelization or prediction purposes to remove irrelevant variables, to select all important ones or to determine a sucient subset for prediction. These main different objectives on a statistical learning perspective involve variable selection to simplify statistical problems, to help diagnosis and interpretation, and to speed up data processing. The authors have proposed a variable selection method based on random forests, and the aim of this presentation is to describe the (recently available on CRAN) associated R package called VSURF and to illustrate its use on real datasets. Introduced by Breiman, random forests (abbreviated RF in the sequel) is an attractive non-parametric statistical method to deal with such problems, since it requires only mild conditions on the model supposed to have generated the observed data. Indeed, since it is based on decision trees and it uses aggregation ideas, RF allow to consider in an elegant and versatile framework dierent models and problems, namely regressions, two-class or multiclass classications. In Genuer 2010 we have distinguished two variable selection objectives: interpretation and prediction. The first is to find important variables highly related to the response variable in order to select all the important variables, even with high redundancy. The second is to find a small number of variables sucient to a good parsimonious prediction of the response variable. We have proposed the following two-step procedure, the first one is the same for the two situations while the second one depends on the objective.
Document type :
Conference papers
Complete list of metadata
Contributor : Robin Genuer Connect in order to contact the contributor
Submitted on : Wednesday, December 17, 2014 - 9:45:46 AM
Last modification on : Tuesday, October 25, 2022 - 4:21:13 PM
Long-term archiving on: : Monday, March 23, 2015 - 2:40:58 PM


Files produced by the author(s)


  • HAL Id : hal-01096237, version 1


R Genuer, J.-M Poggi, C Tuleau-Malot. VSURF : un package R pour la sélection de variables à l'aide de forêts aléatoires. 3èmes Rencontres R, 2014, Montpellier, France. ⟨hal-01096237⟩



Record views


Files downloads