Random Forests: some methodological insights

Robin Genuer; Jean-Michel Poggi; Christine Tuleau

Rapport (Rapport De Recherche) Année : 2008

Random Forests: some methodological insights

(1, 2) , (1, 2) , (3)

1
2
3

Robin Genuer

Fonction : Auteur
PersonId : 1787
IdHAL : robin-genuer
IdRef : 15657490X

Laboratoire de Mathématiques d'Orsay

Model selection in statistical learning

Jean-Michel Poggi

Fonction : Auteur

Laboratoire de Mathématiques d'Orsay

Model selection in statistical learning

Christine Tuleau

Fonction : Auteur

Laboratoire Jean Alexandre Dieudonné

Résumé

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.

Mots clés

Random Forests Regression Classification Variable Importance Variable Selection

Domaines

Machine Learning [stat.ML] Théorie [stat.TH] Statistiques [math.ST]

Fichier principal

RR-6729.pdf (452.14 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Robin Genuer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00340725

Soumis le : vendredi 21 novembre 2008-17:12:30

Dernière modification le : jeudi 14 mars 2024-03:08:43

Archivage à long terme le : lundi 7 juin 2010-23:13:36

Dates et versions

inria-00340725 , version 1 (21-11-2008)

Identifiants

HAL Id : inria-00340725 , version 1
ARXIV : 0811.3619

Citer

Robin Genuer, Jean-Michel Poggi, Christine Tuleau. Random Forests: some methodological insights. [Research Report] RR-6729, INRIA. 2008. ⟨inria-00340725⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRIA-RRRT DIEUDONNE LM-ORSAY INRIA2 LARA UNIV-PARIS-SACLAY UNIV-COTEDAZUR GS-MATHEMATIQUES

797 Consultations

791 Téléchargements

Random Forests: some methodological insights

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager