Group and sparse group partial least square approaches applied in genomics context
Résumé
Motivation: The association between two blocks of ‘omics’ data brings challenging issues in computational
biology due to their size and complexity. Here, we focus on a class of multivariate statistical
methods called partial least square (PLS). Sparse version of PLS (sPLS) operates integration of
two datasets while simultaneously selecting the contributing variables. However, these methods
do not take into account the important structural or group effects due to the relationship between
markers among biological pathways. Hence, considering the predefined groups of markers
(e.g. genesets), this could improve the relevance and the efficacy of the PLS approach.
Results: We propose two PLS extensions called group PLS (gPLS) and sparse gPLS (sgPLS). Our algorithm
enables to study the relationship between two different types of omics data (e.g. SNP and
gene expression) or between an omics dataset and multivariate phenotypes (e.g. cytokine secretion).
We demonstrate the good performance of gPLS and sgPLS compared with the sPLS in the
context of grouped data. Then, these methods are compared through an HIV therapeutic vaccine
trial. Our approaches provide parsimonious models to reveal the relationship between gene abundance
and the immunological response to the vaccine.