SimkaMin: fast and resource frugal de novo comparative metagenomics

Motivation: De novo comparative metagenomics is one of the most straightforward ways to analyze large sets of metagenomic data. Latest methods use the fraction of shared k-mers to estimate genomic similarity between read sets. However, those methods, while extremely efficient, are still limited by computational needs for practical usage outside of large computing facilities. Results: We present SimkaMin, a quick comparative metagenomics tool with low disk and memory footprints, thanks to an efficient data subsampling scheme used to estimate Bray-Curtis and Jaccard dissimilarities. One billion metagenomic reads can be analyzed in <3 min, with tiny memory (1.09 GB) and disk (approximate to 0.3 GB) requirements and without altering the quality of the downstream comparative analyses, making of SimkaMin a tool perfectly tailored for very large-scale metagenomic projects.

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

main.pdf (318.25 Ko)

btz685_supplementary_data.pdf (288.67 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Claire Lemaitre : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02308101

Soumis le : mardi 8 octobre 2019-12:17:09

Dernière modification le : lundi 15 avril 2024-16:32:30

Dates et versions

hal-02308101 , version 1 (08-10-2019)

Identifiants

HAL Id : hal-02308101 , version 1
DOI : 10.1093/bioinformatics/btz685
PRODINRA : 496374
PUBMED : 31504187
WOS : 000518528800044

Citer

Gaëtan Benoit, Mahendra Mariadassou, Stéphane Robin, Sophie Schbath, Pierre Peterlongo, et al.. SimkaMin: fast and resource frugal de novo comparative metagenomics. Bioinformatics, 2020, 36 (4), pp.1-2. ⟨10.1093/bioinformatics/btz685⟩. ⟨hal-02308101⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

AGROPARISTECH UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC MIA-PARIS INRIA2 UR1-MATH-STIC UNIV-PARIS-SACLAY UR1-UFR-ISTIC UNIV-RENNES INRAE ANR UR1-MATH-NUM GS-MATHEMATIQUES GS-COMPUTER-SCIENCE GS-BIOSPHERA GS-LIFE-SCIENCES-HEALTH MAIAGE MICA-UNITES MATHNUM

300 Consultations

425 Téléchargements