VSURF : un package R pour la sélection de variables à l'aide de forêts aléatoires

Robin Genuer 1, 2, * Jean-Michel Poggi 3, 4 Christine Tuleau-Malot 5
* Corresponding author
2 SISTM - Statistics In System biology and Translational Medicine
Epidémiologie et Biostatistique [Bordeaux], Inria Bordeaux - Sud-Ouest
3 SELECT - Model selection in statistical learning
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay, CNRS - Centre National de la Recherche Scientifique : UMR
Abstract : This paper describes the R package VSURF. Based on random forests, it delivers two subsets of variables according to two types of variable selection for clas-sification or regression problems. The first is a subset of important variables which are relevant for interpretation, while the second one is a subset corresponding to a parsimo-nious prediction model. The strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise ascending variable introduction strategy. The two proposals can be ob-tained automatically using data-driven default values, good enough to provide interesting results, but can also be fine-tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.
Document type :
Conference papers
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

Contributor : Robin Genuer <>
Submitted on : Wednesday, December 17, 2014 - 9:39:03 AM
Last modification on : Thursday, February 7, 2019 - 2:38:25 PM
Long-term archiving on : Monday, March 23, 2015 - 2:40:49 PM


Files produced by the author(s)


  • HAL Id : hal-01096233, version 1


Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot. VSURF : un package R pour la sélection de variables à l'aide de forêts aléatoires. 46èmes Journées de Statistique, 2014, Rennes, France. ⟨hal-01096233⟩



Record views


Files downloads