Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

MLGL: An R package implementing correlated variable selection by hierarchical clustering and group-Lasso

Quentin Grimonprez 1 Samuel Blanck 2 Alain Celisse 3, 1 Guillemette Marot 2, 1
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : The MLGL R-package, standing for Multi-Layer Group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high dimensional data. A sparsity assumption is made that is, only a few variables are assumed to be relevant for predicting the response variable. In this context, the performance of classical Lasso-based approaches strongly deteriorates as the redundancy strengthens. The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of regularization parameter. The versatility offered by MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL however exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost. The final choice of the regularization parameter – and therefore the final choice of groups – is made by a multiple hierarchical testing procedure.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download
Contributor : Quentin Grimonprez <>
Submitted on : Tuesday, August 14, 2018 - 3:19:48 PM
Last modification on : Thursday, October 1, 2020 - 12:48:08 PM
Long-term archiving on: : Thursday, November 15, 2018 - 3:39:14 PM


Files produced by the author(s)


  • HAL Id : hal-01857242, version 1



Quentin Grimonprez, Samuel Blanck, Alain Celisse, Guillemette Marot. MLGL: An R package implementing correlated variable selection by hierarchical clustering and group-Lasso. 2018. ⟨hal-01857242⟩



Record views


Files downloads