Random Forests: some methodological insights

Robin Genuer; Jean-Michel Poggi; Christine Tuleau

Reports (Research Report) Year : 2008

Random Forests: some methodological insights

(1, 2) , (1, 2) , (3)

1
2
3

Robin Genuer

Function : Author
PersonId : 1787
IdHAL : robin-genuer
IdRef : 15657490X

Laboratoire de Mathématiques d'Orsay

Model selection in statistical learning

Jean-Michel Poggi

Function : Author

Laboratoire de Mathématiques d'Orsay

Model selection in statistical learning

Christine Tuleau

Function : Author

Laboratoire Jean Alexandre Dieudonné

Abstract

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.

Keywords

Random Forests Regression Classification Variable Importance Variable Selection

Domains

Machine Learning [stat.ML] Statistics Theory [stat.TH] Statistics [math.ST]

Fichier principal

RR-6729.pdf (452.14 Ko)

Origin : Files produced by the author(s)

Robin Genuer : Connect in order to contact the contributor

https://inria.hal.science/inria-00340725

Submitted on : Friday, November 21, 2008-5:12:30 PM

Last modification on : Thursday, March 14, 2024-3:08:43 AM

Long-term archiving on: Monday, June 7, 2010-11:13:36 PM

Dates and versions

inria-00340725 , version 1 (21-11-2008)

Identifiers

HAL Id : inria-00340725 , version 1
ARXIV : 0811.3619

Cite

Robin Genuer, Jean-Michel Poggi, Christine Tuleau. Random Forests: some methodological insights. [Research Report] RR-6729, INRIA. 2008. ⟨inria-00340725⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRIA-RRRT DIEUDONNE LM-ORSAY INRIA2 LARA UNIV-PARIS-SACLAY UNIV-COTEDAZUR GS-MATHEMATIQUES

797 View

789 Download

Random Forests: some methodological insights

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share