High-dimensional compositional microbiota data: state-of-the-art of methods and software implementations - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

High-dimensional compositional microbiota data: state-of-the-art of methods and software implementations

Résumé

Compositional data (CoDa) consist of a collection of nonnegative measurements that sum to a constant value, typically, proportions that sum to 1. Because knowing the sum, one component can be determined from the sum of the remainder, the parts that make up the composition are mathematically and statistically dependent. This distinct structure complicates analysis and does not allow standard statistical analyses. Aitchison (JRSS-B, 1982) and Egozcue and colleagues (Math. Geol., 2003), among others, provided a framework to analyze CoDa by mapping data from the constrained simplex space to the Euclidian space using nonlinear transforms such as the log-odds or the isometric log-ratio transforms. The increasing quality/reducing cost of high-throughput sequencing technology, in particular, 16S rRNA gene sequencing of the bacterial component of the human microbial community (microbiota), has enabled researchers to investigate human diseases. Subsequently, microbiota has been associated with numerous diseases, including inflammatory bowel disease, diabetes, cancer and cystic fibrosis. Because of the compositional structure and the high-dimensional data generated by microbiota sequencing, there is also a parallel development of specific statistical analysis methods and computational tools. Microbiota are usually measured as relative abundance of species and analyzed as CoDa. The objectives of this work are the following: - First, to review theory and usage of CoDa analysis in the microbiota setting, with particular emphasis on recent proposals adapted to high-dimensional problems (e.g. supervised –constrained Lasso, hierarchical Lasso, kernel methods, sPLS, or unsupervised – PCoA, PCA, Sparse inverse covariance estimation).- Second, to investigate the current state-of-the-art software implementations (basically, R packages: compositions, vegan, ALDex2, PERMANOVA, MiRKAT, MixMC . . . )- Third, using toy examples and publicly available data (the 16S data from the Koren and colleagues’ study in March 2011’s PNAS, available in the MixMC R package), to implement and evaluate those methods with publicly available codes. Evaluation criteria are mainly based oncomputational and practical aspects.
Fichier principal
Vignette du fichier
SORET_GdR_Stat&Santé_2017.pdf (1.76 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01667295 , version 1 (19-12-2017)

Identifiants

  • HAL Id : hal-01667295 , version 1

Citer

Perrine Soret, Marta Fernandez Avalos, Soon Cheng, Rodolphe Thiebaut. High-dimensional compositional microbiota data: state-of-the-art of methods and software implementations. 2017 - GDR « Statistiques et santé », Oct 2017, Bordeaux, France. ⟨hal-01667295⟩
669 Consultations
197 Téléchargements

Partager

Gmail Facebook X LinkedIn More