Multiple comparative metagenomics using multiset k -mer counting

Abstract : Background Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely on a small subset of the sequences that can be associated to known organisms. On the other hand, de novo methods, that compare the whole sets of sequences, either do not scale up on ambitious metagenomic projects or do not provide precise and exhaustive results. Methods These limitations motivated the development of a new de novo metagenomic comparative method, called Simka. This method computes a large collection of standard ecological distances by replacing species counts by k-mer counts. Simka scales-up today’s metagenomic projects thanks to a new parallel k-mer counting strategy on multiple datasets. Results Experiments on public Human Microbiome Project datasets demonstrate that Simka captures the essential underlying biological structure. Simka was able to compute in a few hours both qualitative and quantitative ecological distances on hundreds of metagenomic samples (690 samples, 32 billions of reads). We also demonstrate that analyzing metagenomes at the k-mer level is highly correlated with extremely precise de novo comparison techniques which rely on all-versus-all sequences alignment strategy or which are based on taxonomic profiling.
Type de document :
Article dans une revue
PeerJ Computer Science, PeerJ, 2016, 2, 〈10.7717/peerj-cs.94〉
Liste complète des métadonnées

Littérature citée [37 références]  Voir  Masquer  Télécharger
Contributeur : Claire Lemaitre <>
Soumis le : mardi 15 novembre 2016 - 15:05:54
Dernière modification le : jeudi 11 janvier 2018 - 06:23:52
Document(s) archivé(s) le : jeudi 16 mars 2017 - 13:14:35


Fichiers produits par l'(les) auteur(s)



Gaëtan Benoit, Pierre Peterlongo, Mahendra Mariadassou, Erwan Drezen, Sophie Schbath, et al.. Multiple comparative metagenomics using multiset k -mer counting. PeerJ Computer Science, PeerJ, 2016, 2, 〈10.7717/peerj-cs.94〉. 〈hal-01397150〉



Consultations de la notice


Téléchargements de fichiers