MLGL: An R package implementing correlated variable selection by hierarchical clustering and group-Lasso

Quentin Grimonprez 1 Samuel Blanck 2 Alain Celisse 3, 1 Guillemette Marot 2, 1
1 MODAL - MOdel for Data Analysis and Learning
LPP - Laboratoire Paul Painlevé - UMR 8524, Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille - École polytechnique universitaire de Lille
Abstract : The MLGL R-package, standing for Multi-Layer Group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high dimensional data. A sparsity assumption is made that is, only a few variables are assumed to be relevant for predicting the response variable. In this context, the performance of classical Lasso-based approaches strongly deteriorates as the redundancy strengthens. The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of regularization parameter. The versatility offered by MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL however exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost. The final choice of the regularization parameter – and therefore the final choice of groups – is made by a multiple hierarchical testing procedure.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal.inria.fr/hal-01857242
Contributor : Quentin Grimonprez <>
Submitted on : Tuesday, August 14, 2018 - 3:19:48 PM
Last modification on : Friday, April 19, 2019 - 4:54:53 PM
Long-term archiving on : Thursday, November 15, 2018 - 3:39:14 PM

File

hcgglasso.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01857242, version 1

Collections

Citation

Quentin Grimonprez, Samuel Blanck, Alain Celisse, Guillemette Marot. MLGL: An R package implementing correlated variable selection by hierarchical clustering and group-Lasso. 2018. ⟨hal-01857242⟩

Share

Metrics

Record views

787

Files downloads

356