Handling Correlations in Random Forests: which Impacts on Variable Importance and Model Interpretability? - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Handling Correlations in Random Forests: which Impacts on Variable Importance and Model Interpretability?

Résumé

The present manuscript tackles the issues of model interpretability and variable importance in random forests, in the presence of correlated input variables. Variable importance criteria based on random permutations are known to be sensitive when input variables are correlated, and may lead for instance to unreliability in the importance ranking. In order to overcome some of the problems raised by correlation, an original variable importance measure is introduced. The proposed measure builds upon an algorithm which clusters the input variables based on their correlations, and summarises each such cluster by a synthetic variable. The effectiveness of the proposed criterion is illustrated through simulations in a regression context, and compared with several existing variable importance measures.
Fichier principal
Vignette du fichier
SMDA_ESANN2021_Corrected (2).pdf (274.19 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03483385 , version 1 (16-12-2021)

Identifiants

  • HAL Id : hal-03483385 , version 1

Citer

Marie Chavent, Jerome Lacaille, Alex Mourer, Madalina Olteanu. Handling Correlations in Random Forests: which Impacts on Variable Importance and Model Interpretability?. ESANN 2021 - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Oct 2021, Bruges, Belgium. ⟨hal-03483385⟩
166 Consultations
1184 Téléchargements

Partager

Gmail Facebook X LinkedIn More