Grouped variable importance with random forests and application to multiple functional data analysis

Abstract : The selection of grouped variables using the random forest algorithm is considered. First a new importance measure adapted for groups of variables is proposed. Theoretical insights into this criterion are given for additive regression models. Second, an original method for selecting functional variables based on the grouped variable importance measure is developed. Using a wavelet basis, it is proposed to regroup all of the wavelet coefficients for a given functional variable and use a wrapper selection algorithm with these groups. Various other groupings which take advantage of the frequency and time localization of the wavelet basis are proposed. An extensive simulation study is performed to illustrate the use of the grouped importance measure in this context. The method is applied to a real life problem coming from aviation safety.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [45 references]  Display  Hide  Download

https://hal.inria.fr/hal-01084301
Contributor : Baptiste Gregorutti <>
Submitted on : Thursday, April 9, 2015 - 10:49:38 AM
Last modification on : Thursday, March 21, 2019 - 1:00:06 PM
Long-term archiving on : Tuesday, April 18, 2017 - 3:29:09 PM

File

Grouped_Variable_Importance_re...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01084301, version 2

Citation

Baptiste Gregorutti, Bertrand Michel, Philippe Saint-Pierre. Grouped variable importance with random forests and application to multiple functional data analysis. 2015. ⟨hal-01084301v2⟩

Share

Metrics

Record views

441

Files downloads

791