Skip to Main content Skip to Navigation
New interface
Journal articles

MLGL: An R package implementing correlated variable selection by hierarchical clustering and group-Lasso

Quentin Grimonprez 1 Samuel Blanck 2 Alain Celisse 1 Guillemette Marot 2, 1 
1 MODAL - MOdel for Data Analysis and Learning
LPP - Laboratoire Paul Painlevé - UMR 8524, Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille
Abstract : The MLGL R-package, standing for Multi-Layer Group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high dimensional data. A sparsity assumption is made that is, only a few variables are assumed to be relevant for predicting the response variable. In this context, the performance of classical Lasso-based approaches strongly deteriorates as the redundancy strengthens. The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of regularization parameter. The versatility offered by MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL however exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost. The final choice of the regularization parameter – and therefore the final choice of groups – is made by a multiple hierarchical testing procedure.
Document type :
Journal articles
Complete list of metadata

https://hal.inria.fr/hal-01857242
Contributor : Guillemette Marot Connect in order to contact the contributor
Submitted on : Monday, February 28, 2022 - 6:47:29 PM
Last modification on : Tuesday, November 22, 2022 - 2:26:15 PM

File

MLGL2022.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01857242, version 2

Collections

Citation

Quentin Grimonprez, Samuel Blanck, Alain Celisse, Guillemette Marot. MLGL: An R package implementing correlated variable selection by hierarchical clustering and group-Lasso. Journal of Statistical Software, In press. ⟨hal-01857242v2⟩

Share

Metrics

Record views

915

Files downloads

1337