P. Évaluation-des, 50 3.5.1 Performances sur de petits jeux de lectures, ., p.53

.. Sous-Échantillonnage-au-niveau-des-lectures, 78 5.2.1 Erreur d'estimation des distances issues du sous-échantillonnage, p.83

.. Sous-Échantillonnage-au-niveau-des-vecteurs-d-'abondances, 84 5.3.1 Erreur d'estimation des distances issues du sous-échantillonnage, p.84

.. Bray-curtis, SimkaMin : nouvelle méthode d'estimation de la distance de, p.84

S. Ehrlich, Metagenomics of the intestinal microbiota: potential applications, Gastroent??rologie Clinique et Biologique, vol.34, pp.23-28, 2010.
DOI : 10.1016/S0399-8320(10)70017-8

E. Karsenti, G. Silvia, P. Acinas, C. Bork, C. D. Bowler et al., A Holistic Approach to Marine Eco-Systems Biology, PLoS Biology, vol.6, issue.10, pp.9-1001177, 2011.
DOI : 10.1371/journal.pbio.1001177.g002

URL : https://hal.archives-ouvertes.fr/hal-00691580

F. Sanger, S. Nicklen, and A. R. Coulson, Dna sequencing with chainterminating inhibitors, Proceedings of the national academy of sciences, pp.5463-5467, 1977.

M. Allan, W. Maxam, and . Gilbert, A new method for sequencing dna, Proceedings of the National Academy of Sciences, pp.560-564, 1977.

R. Carl, . Woese, E. George, and . Fox, Phylogenetic structure of the prokaryotic domain : the primary kingdoms, Proceedings of the National Academy of Sciences, pp.5088-5090, 1977.

B. Saul, . Needleman, D. Christian, and . Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, vol.48, issue.3, pp.443-453, 1970.

F. Stephen, W. Altschul, W. Gish, . Miller, W. Eugene et al., Basic local alignment search tool, Journal of molecular biology, vol.215, issue.3, pp.403-410, 1990.

R. Frederick, G. Blattner, . Plunkett, A. Craig, . Bloch et al., The complete genome sequence of escherichia coli k-12, science, issue.5331, pp.2771453-1462, 1997.

C. Venter, D. Mark, E. W. Adams, . Myers, W. Peter et al., al. The sequence of the human genome, science, issue.5507, pp.2911304-1351, 2001.

S. Jorge and . Reis-filho, Next-generation sequencing, Breast Cancer Research, vol.11, issue.3, p.12, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01268721

T. Jared, R. Simpson, and . Durbin, Efficient de novo assembly of large genomes using compressed data structures, Genome research, vol.22, issue.3, pp.549-556, 2012.

R. Daniel, E. Zerbino, and . Birney, Velvet : algorithms for de novo short read assembly using de bruijn graphs, Genome research, vol.18, issue.5, pp.821-829, 2008.

T. Jared, K. Simpson, . Wong, D. Shaun, J. E. Jackman et al., Abyss : a parallel assembler for short read sequence data, Genome research, vol.19, issue.6, pp.1117-1123, 2009.

D. Shaun, . Jackman, P. Benjamin, H. Vandervalk, J. Mohamadi et al., Abyss 2.0 : resource-efficient assembly of large genomes using a bloom filter, Genome research, vol.27, issue.5, pp.768-777, 2017.

C. Bleidorn, Third generation sequencing: technology and its potential impact on evolutionary biodiversity research, Systematics and Biodiversity, vol.10, issue.1, pp.1-8, 2016.
DOI : 10.1111/j.2041-210X.2012.00198.x

T. James, A. Staley, and . Konopka, Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats, Annual Reviews in Microbiology, vol.39, issue.1, pp.321-346, 1985.

I. Rudolf, W. Amann, K. Ludwig, and . Schleifer, Phylogenetic identification and in situ detection of individual microbial cells without cultivation, Microbiological reviews, vol.59, issue.1, pp.143-169, 1995.

R. Jared and . Leadbetter, Cultivation of recalcitrant microbes : cells are alive, well and revealing their secrets in the 21st century laboratory, Current opinion in microbiology, vol.6, issue.3, pp.274-281, 2003.

M. Baker, Method offers DNA blueprint of a single human cell, Nature, 2012.
DOI : 10.1038/nature.2012.12088

C. Gawad, W. Koh, R. Stephen, and . Quake, Single-cell genome sequencing: current state of the science, Nature Reviews Genetics, vol.148, issue.3, p.175, 2016.
DOI : 10.1038/onc.2013.29

J. Handelsman, R. Michelle, . Rondon, F. Sean, J. Brady et al., Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products, Chemistry & Biology, vol.5, issue.10, pp.245-249, 1998.
DOI : 10.1016/S1074-5521(98)90108-9

A. Jack, . Gilbert, L. Christopher, and . Dupont, Microbial metagenomics : beyond the genome, Annual Review of Marine Science, vol.3, pp.347-371, 2011.

J. Qin, R. Li, J. Raes, M. Arumugam, C. Kristoffer-solvsten-burgdorf et al., al. A human gut microbial gene catalog established by metagenomic sequencing, nature, issue.7285, p.46459, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01500651

A. Jeremy, . Frank, J. Søren, and . Sørensen, Quantitative metagenomic analyses based on average genome size normalization, Applied and environmental microbiology, vol.77, issue.7, pp.2513-2521, 2011.

R. Norman, . Pace, A. David, . Stahl, J. David et al., The analysis of natural microbial populations by ribosomal rna sequences, Advances in microbial ecology, pp.1-55, 1986.

N. Desai, D. Antonopoulos, A. Jack, . Gilbert, M. Elizabeth et al., From genomics to metagenomics, Current Opinion in Biotechnology, vol.23, issue.1, pp.72-76, 2012.
DOI : 10.1016/j.copbio.2011.12.017

D. Kim, T. Pruitt, D. R. Tatusova, and . Maglott, Ncbi reference sequences (refseq) : a curated non-redundant sequence database of genomes, transcripts and proteins, Nucleic acids research, vol.35, issue.suppl_1, pp.61-65, 2006.

R. James, Q. Cole, . Wang, . Cardenas, B. Fish et al., The ribosomal database project : improved alignments and new tools for rrna analysis, Nucleic acids research, issue.suppl_1, pp.37-141, 2008.

D. Mcdonald, N. Morgan, J. Price, . Goodrich, P. Eric et al., An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea, The ISME Journal, vol.180, issue.3, pp.610-618, 2012.
DOI : 10.1038/nature08656

J. Gregory-caporaso, J. Kuczynski, J. Stombaugh, K. Bittinger, D. Frederic et al., QIIME allows analysis of high-throughput community sequencing data, Nature Methods, vol.8, issue.5, pp.335-336, 2010.
DOI : 10.1038/nmeth.f.303

D. Patrick, . Schloss, L. Sarah, T. Westcott, . Ryabin et al., Introducing mothur : open-source, platform-independent, communitysupported software for describing and comparing microbial communities, Applied and environmental microbiology, issue.23, pp.757537-7541, 2009.

H. Daniel, S. Huson, I. Beier, A. Flade, M. Górska et al., Megan community edition-interactive exploration and analysis of large-scale microbiome sequencing data, PLoS computational biology, issue.6, pp.12-1004957, 2016.

H. Li and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, vol.9, issue.11, pp.1754-1760, 2009.
DOI : 10.1186/1471-2105-9-128

URL : https://academic.oup.com/bioinformatics/article-pdf/25/14/1754/605544/btp324.pdf

B. Langmead, L. Steven, and . Salzberg, Fast gapped-read alignment with Bowtie 2, Nature Methods, vol.9, issue.4, pp.357-359, 2012.
DOI : 10.1093/bioinformatics/btp352

URL : http://europepmc.org/articles/pmc3322381?pdf=render

S. Vinga and J. Almeida, Alignment-free sequence comparison--a review, Bioinformatics, vol.19, issue.4, pp.513-523, 2003.
DOI : 10.1093/bioinformatics/btg005

E. Derrick, . Wood, L. Steven, and . Salzberg, Kraken : ultrafast metagenomic sequence classification using exact alignments, Genome biology, vol.15, issue.31, 2014.

Z. Zhang, S. Schwartz, L. Wagner, and W. Miller, A Greedy Algorithm for Aligning DNA Sequences, Journal of Computational Biology, vol.7, issue.1-2, pp.203-214, 2000.
DOI : 10.1089/10665270050081478

D. Kim, L. Song, P. Florian, . Breitwieser, L. Steven et al., Centrifuge: rapid and sensitive classification of metagenomic sequences, Genome Research, vol.26, issue.12, pp.1721-1729, 2016.
DOI : 10.1101/gr.210641.116

P. Menzel, K. L. Ng, and A. Krogh, Fast and sensitive taxonomic classification for metagenomics with kaiju, Nature communications, 2016.

A. Mitchell, F. Bucchini, G. Cochrane, H. Denise, M. Petra-ten-hoopen et al., EBI metagenomics in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data, Nucleic Acids Research, vol.173, issue.D1, pp.44-595, 2015.
DOI : 10.1093/nar/gkt919

D. Robert, J. Finn, . Clements, R. Sean, and . Eddy, Hmmer web server : interactive sequence similarity searching, Nucleic acids research, vol.39, issue.suppl_2, pp.29-37, 2011.

N. Segata, L. Waldron, A. Ballarini, V. Narasimhan, O. Jousson et al., Metagenomic microbial community profiling using unique clade-specific marker genes, Nature Methods, vol.2008, issue.8, pp.811-814, 2012.
DOI : 10.1093/nar/gkn879

URL : http://europepmc.org/articles/pmc3443552?pdf=render

. Duy-tin-truong, A. Eric, . Franzosa, L. Timothy, M. Tickle et al., Metaphlan2 for enhanced metagenomic taxonomic profiling, Nature methods, issue.10, p.12902, 2015.

A. Jonathan and . Eisen, Environmental shotgun sequencing : its potential and challenges for studying the hidden world of microbes, PLoS biology, vol.5, issue.3, p.82, 2007.

F. Alexander, M. Koeppel, and . Wu, Surprisingly extensive mixed phylogenetic and ecological signals among bacterial operational taxonomic units, Nucleic acids research, vol.41, issue.10, pp.5175-5188, 2013.

Y. Cai and Y. Sun, ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time, Nucleic Acids Research, vol.11, issue.14, pp.95-95, 2011.
DOI : 10.1186/1471-2105-11-152

W. Li and A. Godzik, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, vol.18, issue.3, pp.1658-1659, 2006.
DOI : 10.1093/bioinformatics/17.3.282

C. Robert and . Edgar, Search and clustering orders of magnitude faster than blast, Bioinformatics, vol.26, issue.19, pp.2460-2461, 2010.

C. Mercier, F. Boyer, A. Bonin, and E. Coissac, Sumatra and sumaclust : fast and exact comparison and clustering of sequences, Programs and Abstracts of the SeqBio 2013 workshop. Abstract, pp.27-29, 2013.

F. Mahé, T. Rognes, C. Quince, M. Colomban-de-vargas, and . Dunthorn, Swarm: robust and fast clustering method for amplicon-based studies, PeerJ, vol.92, issue.1, p.593, 2014.
DOI : 10.7717/peerj.593/supp-8

F. Joseph, S. Petrosino, R. A. Highlander, . Luna, A. Richard et al., Metagenomic pyrosequencing and microbial identification, Clinical chemistry, vol.55, issue.5, pp.856-866, 2009.

G. Piganeau, A. Eyre-walker, N. Grimsley, and H. , How and why DNA barcodes underestimate the diversity of microbial eukaryotes, PLoS ONE, vol.6, issue.2, 2011.

U. Nalbantoglu, A. Cakar, H. Dogan, N. Abaci, D. Ustek et al., Metagenomic analysis of the microbial community in kefir grains, Food Microbiology, vol.41, pp.42-51, 2014.
DOI : 10.1016/j.fm.2014.01.014

T. Vannier, J. Leconte, Y. Seeleuthner, S. Mondy, E. Pelletier et al., Daniel Vaulot, et al. Survey of the green picoalga bathycoccus genomes in the global ocean Scientific reports, 2016.

T. Marcelino, . Suzuki, J. Stephen, and . Giovannoni, Bias caused by template annealing in the amplification of mixtures of 16s rrna genes by pcr, Applied and environmental microbiology, vol.62, issue.2, pp.625-630, 1996.

G. Silvia, . Acinas, A. Luisa, V. Marcelino, . Klepac-ceraj et al., Divergence and redundancy of 16s rrna sequences in genomes with multiple rrn operons, Journal of bacteriology, vol.186, issue.9, pp.2629-2635, 2004.

L. Cai, L. Ye, A. H. , Y. Tong, S. Lok et al., Biased Diversity Metrics Revealed by Bacterial 16S Pyrotags Derived from Different Primer Sets, PLoS ONE, vol.318, issue.1, p.53649, 2013.
DOI : 10.1371/journal.pone.0053649.s003

C. John, A. Wooley, I. Godzik, and . Friedberg, A primer on metagenomics, PLoS computational biology, vol.6, issue.2, p.1000667, 2010.

Y. Peng, C. Henry, S. Leung, . Yiu, Y. Francis et al., Meta-IDBA: a de Novo assembler for metagenomic data, Bioinformatics, vol.4, issue.12, pp.94-101, 2011.
DOI : 10.1371/journal.pone.0008407

N. Segata, D. Boernigen, L. Timothy, . Tickle, C. Xochitl et al., Computational meta'omics for microbial community studies, Molecular Systems Biology, vol.109, issue.1, p.666, 2013.
DOI : 10.1073/pnas.1120577109

T. Namiki, T. Hachiya, H. Tanaka, and Y. Sakakibara, MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads, Nucleic Acids Research, vol.20, issue.20, pp.155-155, 2012.
DOI : 10.1111/j.1365-294X.2010.04948.x

Y. Fofanov, Y. Luo, C. Katili, J. Wang, Y. Belosludtsev et al., How independent are the appearances of n-mers in different genomes?, Bioinformatics, vol.20, issue.15, pp.202421-2428, 2004.
DOI : 10.1093/bioinformatics/bth266

Y. Wu and Y. Ye, A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples, Journal of Computational Biology, vol.18, issue.3, pp.523-534, 2011.
DOI : 10.1007/978-3-642-12683-3_35

C. De-filippo, M. Ramazzotti, P. Fontana, and D. Cavalieri, Bioinformatic approaches for functional annotation and pathway inference in metagenomics data, Briefings in Bioinformatics, vol.107, issue.33, pp.696-710, 2012.
DOI : 10.1073/pnas.1005963107

A. Sczyrba, P. Hofmann, P. Belmann, D. Koslicki, S. Janssen et al., Eik Dahms, et al. Critical assessment of metagenome interpretation-a benchmark of computational metagenomics software, Biorxiv, p.99127, 2017.

S. Kariin and C. Burge, Dinucleotide relative abundance extremes: a genomic signature, Trends in Genetics, vol.11, issue.7, pp.283-290, 1995.
DOI : 10.1016/S0168-9525(00)89076-9

H. Teeling, J. Waldmann, T. Lombardot, M. Bauer, O. Frank et al., Tetra : a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in dna sequences, BMC Bioinformatics, vol.5, issue.1, p.163, 2004.
DOI : 10.1186/1471-2105-5-163

H. Teeling and F. O. Glöckner, Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective, Briefings in Bioinformatics, vol.101, issue.9, pp.728-742, 2012.
DOI : 10.1016/j.bpj.2011.08.038

Y. Wang, C. Henry, S. Leung, . Yiu, Y. Francis et al., MetaCluster 4.0: A Novel Binning Algorithm for NGS Reads and Huge Number of Species, Journal of Computational Biology, vol.19, issue.2, pp.241-249, 2012.
DOI : 10.1089/cmb.2011.0276

J. Macqueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp.281-297, 1967.

J. Alneberg, I. Brynjar-smári-bjarnason, M. De-bruijn, J. Schirmer, . Quick et al., Binning metagenomic contigs by coverage and composition, Nature Methods, vol.486, issue.11, pp.1144-1146, 2014.
DOI : 10.1126/science.1123061

D. Dongwan, J. Kang, R. Froula, Z. Egan, and . Wang, Metabat, an efficient tool for accurately reconstructing single genomes from complex microbial communities, PeerJ, vol.3, p.1165, 2015.

F. Stephen, . Altschul, L. Thomas, A. A. Madden, J. Schäffer et al., Gapped blast and psi-blast : a new generation of protein database search programs, Nucleic acids research, vol.25, issue.17, pp.3389-3402, 1997.

J. Katharina, M. Hoff, T. Tech, R. Lingner, B. Daniel et al., Gene prediction in metagenomic fragments : a large scale machine learning approach, BMC bioinformatics, vol.9, issue.1, p.217, 2008.

J. Katharina, T. Hoff, P. Lingner, M. Meinicke, and . Tech, Orphelia : predicting genes in metagenomic sequencing reads, Nucleic acids research, vol.37, issue.suppl_2, pp.101-105, 2009.

L. Rabiner and B. Juang, An introduction to hidden Markov models, IEEE ASSP Magazine, vol.3, issue.1, pp.4-16, 1986.
DOI : 10.1109/MASSP.1986.1165342

A. Bateman, L. Coin, R. Durbin, D. Robert, V. Finn et al., The Pfam protein families database, Nucleic Acids Research, vol.32, issue.90001, pp.138-141, 2004.
DOI : 10.1093/nar/gkh121

URL : https://hal.archives-ouvertes.fr/hal-01294685

U. Consortium, Reorganizing the protein space at the universal protein resource (uniprot) Nucleic acids research, p.981, 2011.

D. Robert, . Finn, K. Teresa, . Attwood, C. Patricia et al., Interpro in 2017?beyond protein family and domain annotations, Nucleic acids research, issue.D1, pp.45-190, 2016.

M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, and M. Hattori, The KEGG resource for deciphering the genome, Nucleic Acids Research, vol.32, issue.90001, pp.277-280, 2004.
DOI : 10.1093/nar/gkh063

S. Rachel, N. Poretsky, A. Bano, G. Buchan, J. Lecleir et al., Analysis of microbial gene transcripts in environmental samples, Applied and Environmental Microbiology, issue.7, pp.714121-4126, 2005.

A. Jack, D. Gilbert, Y. Field, R. Huang, W. Edwards et al., Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities, PloS one, vol.3, issue.8, p.3042, 2008.

P. Legendre and M. D. Cáceres, Beta diversity as the variance of community data: dissimilarity coefficients and partitioning, Ecology Letters, vol.72, issue.8, pp.951-963, 2013.
DOI : 10.2307/2259551

J. Roger-bray, T. John, and . Curtis, An Ordination of the Upland Forest Communities of Southern Wisconsin, Ecological Monographs, vol.27, issue.4, pp.325-349, 1957.
DOI : 10.2307/1942268

C. Lozupone and R. Knight, UniFrac: a New Phylogenetic Method for Comparing Microbial Communities, Applied and Environmental Microbiology, vol.71, issue.12, pp.8228-8235, 2005.
DOI : 10.1128/AEM.71.12.8228-8235.2005

M. Arumugam, J. Raes, E. Pelletier, D. L. Paslier, T. Yamada et al., Enterotypes of the human gut microbiome, Nature, issue.7346, pp.473174-180, 2011.
URL : https://hal.archives-ouvertes.fr/cea-00903625

D. Gary, J. Wu, C. Chen, K. Hoffmann, Y. Bittinger et al., Linking long-term dietary patterns with gut microbial enterotypes, Science, issue.6052, pp.334105-108, 2011.

M. Susan, Y. Huse, Y. Ye, . Zhou, A. Anthony et al., A core human microbiome as viewed through 16s rrna sequence clusters, PloS one, vol.7, issue.6, p.34242, 2012.

S. Abubucker, N. Segata, J. Goll, M. Alyxandria, J. Schubert et al., Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome, PLoS Computational Biology, vol.8, issue.6, p.1002358, 2012.
DOI : 10.1371/journal.pcbi.1002358.s015

A. Elizabeth, J. A. Grice, and . Segre, The skin microbiome, Nature reviews. Microbiology, vol.9, issue.4, p.244, 2011.

G. Susannah, T. Tringe, X. Zhang, Y. Liu, W. Yu et al., al. The airborne metagenome in an indoor urban environment, PloS one, vol.3, issue.4, p.1862, 2008.

S. Lax, P. Daniel, J. Smith, . Hampton-marcell, M. Sarah et al., Longitudinal analysis of microbial interaction between humans and the indoor environment, Science, vol.23, issue.7, pp.3451048-1052, 2014.
DOI : 10.1101/gr.155465.113

F. Meyer, D. Paarmann, D. Mark, R. Souza, . Olson et al., The metagenomics RAST server ??? a public resource for the automatic phylogenetic and functional analysis of metagenomes, BMC Bioinformatics, vol.9, issue.1, p.386, 2008.
DOI : 10.1186/1471-2105-9-386

O. Tom, P. Delmont, I. Robe, P. Clark, T. M. Simonet et al., Metagenomic comparison of direct and indirect soil dna extraction approaches, Journal of microbiological methods, vol.86, issue.3, pp.397-400, 2011.

O. Tom, E. Delmont, . Prestat, P. Kevin, M. Keegan et al., Structure, fluctuation and magnitude of a natural grassland soil metagenome, The ISME journal, vol.6, issue.9, p.1677, 2012.

O. Tom, P. Delmont, T. M. Simonet, and . Vogel, Describing microbial communities and performing global comparisons in the 'omic era, The ISME journal, vol.6, issue.9, p.1625, 2012.

B. Douglas, . Rusch, L. Aaron, G. Halpern, K. B. Sutton et al., The sorcerer ii global ocean sampling expedition : northwest atlantic through eastern tropical pacific, PLoS biology, vol.5, issue.3, p.77, 2007.

C. Metasub-international, The metagenomics and metadesign of the subways and urban biomes (metasub) international consortium inaugural meeting report. Microbiome, pp.1-14, 2016.

A. Kopf, M. Bicak, R. Kottmann, J. Schnetzer, I. Kostadinov et al., The ocean sampling day consortium, GigaScience, vol.488, issue.7413, p.27, 2015.
DOI : 10.1038/nature11397

URL : https://hal.archives-ouvertes.fr/hal-01174095

W. James and K. , Blat?the blast-like alignment tool, Genome research, vol.12, issue.4, pp.656-664, 2002.

F. Temple, . Smith, S. Michael, and . Waterman, Identification of common molecular subsequences, Journal of molecular biology, vol.147, issue.1, pp.195-197, 1981.

D. Fimereli, V. Detours, and T. Konopka, TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data, Nucleic Acids Research, vol.28, issue.8, p.94, 2013.
DOI : 10.1093/bioinformatics/bts100

N. Maillet, C. Lemaitre, R. Chikhi, D. Lavenier, and P. Peterlongo, Compareads: comparing huge metagenomic experiments, BMC Bioinformatics, vol.13, issue.Suppl 19, p.10, 2012.
DOI : 10.1371/journal.pbio.0050077

URL : https://hal.archives-ouvertes.fr/hal-00760332

N. Maillet, G. Collet, T. Vannier, D. Lavenier, and P. Peterlongo, Commet: Comparing and combining multiple metagenomic datasets, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.94-98, 2014.
DOI : 10.1109/BIBM.2014.6999135

URL : https://hal.archives-ouvertes.fr/hal-01080050

B. D. Ondov, T. J. Treangen, P. Melsted, A. B. Mallonee, N. H. Bergman et al., Mash: fast genome and metagenome distance estimation using MinHash, Genome Biology, vol.19, issue.Suppl 19, p.132, 2016.
DOI : 10.1089/cmb.2012.0021

URL : https://doi.org/10.1186/s13059-016-0997-x

Z. Andrei and . Broder, On the resemblance and containment of documents, Compression and Complexity of Sequences 1997. Proceedings, pp.21-29, 1997.

R. Li, W. Fan, G. Tian, H. Zhu, L. He et al., The sequence and de novo assembly of the giant panda genome, Nature, vol.110, issue.7279, p.463311, 2010.
DOI : 10.1093/oxfordjournals.molbev.a025957

G. Marçais and C. Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers, Bioinformatics, vol.78, issue.6, pp.764-770, 2011.
DOI : 10.1103/PhysRevE.78.061912

G. Rizk, D. Lavenier, and R. Chikhi, DSK: k-mer counting with very low memory usage, Bioinformatics, vol.12, issue.6, p.20, 2013.
DOI : 10.1186/1471-2105-12-333

URL : https://hal.archives-ouvertes.fr/hal-00778473

S. Deorowicz, M. Kokot, S. Grabowski, and A. Debudaj-grabysz, KMC 2: fast and resource-frugal k-mer counting, Bioinformatics, vol.15, issue.10, pp.311569-1576, 2015.
DOI : 10.1186/gb-2014-15-3-r46

URL : https://academic.oup.com/bioinformatics/article-pdf/31/10/1569/17085507/btv022.pdf

M. Roberts, W. Hayes, R. Brian, . Hunt, M. Stephen et al., Reducing storage requirements for biological sequence comparison, Bioinformatics, vol.20, issue.18, pp.3363-3369, 2004.
DOI : 10.1093/bioinformatics/bth408

URL : https://academic.oup.com/bioinformatics/article-pdf/20/18/3363/520444/bth408.pdf

Y. Li, Mspkmercounter : a fast and memory efficient approach for k-mer counting. arXiv preprint, 2015.

M. Kokot, S. Deorowicz, and A. Debudaj-grabysz, Sorting Data on Ultra-Large Scale with RADULS, International Conference : Beyond Databases, Architectures and Structures, pp.235-245, 2017.
DOI : 10.1007/978-3-540-74466-5_72

B. Veronika, . Dubinkina, S. Dmitry, . Ischenko, I. Vladimir et al., Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis, BMC bioinformatics, vol.17, issue.1, p.38, 2016.

S. Seth, N. Välimäki, S. Kaski, and A. Honkela, Exploration and retrieval of whole-metagenome sequencing samples, Bioinformatics, vol.5, issue.17, pp.2471-2479, 2014.
DOI : 10.1371/journal.pcbi.1000352

I. Vladimir, S. V. Ulyantsev, V. B. Kazakov, A. V. Dubinkina, D. G. Tyakht et al., Metafast : fast reference-free graph-based comparison of shotgun metagenomic data, Bioinformatics, 2016.

J. Qin, Y. Li, Z. Cai, S. Li, J. Zhu et al., A metagenome-wide association study of gut microbiota in type 2 diabetes, Nature, vol.334, issue.7418, pp.49055-60, 2012.
DOI : 10.1126/science.1208344

URL : https://hal.archives-ouvertes.fr/hal-01204262

M. Edward and . Mccreight, A space-economical suffix tree construction algorithm, Journal of the ACM (JACM), vol.23, issue.2, pp.262-272, 1976.

N. Välimäki, J. Simon, and . Puglisi, Distributed string mining for high-throughput sequencing data, WABI, pp.441-452, 2012.

E. Afshinnekoo, C. Meydan, S. Chowdhury, D. Jaroudi, C. Boyer et al., Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics, Cell Systems, vol.1, issue.1, pp.72-87, 2015.
DOI : 10.1016/j.cels.2015.01.001

P. Deutsch and J. Gailly, Zlib compressed data format specification version 3.3, 1950.
DOI : 10.17487/rfc1950

S. Pavoine, E. Vela, S. Gachet, M. B. Gérard-de-bélair, and . Bonsall, Linking patterns in phylogeny, traits, abiotic variables and space: a novel approach to linking environmental filtering and plant community assembly, Journal of Ecology, vol.2, issue.1, pp.165-175, 2011.
DOI : 10.1111/j.1600-0706.2008.16668.x

URL : https://hal.archives-ouvertes.fr/halsde-00611063

O. Koren, D. Knights, A. Gonzalez, L. Waldron, N. Segata et al., A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets, PLoS Computational Biology, vol.94, issue.Suppl 1, p.1002863, 2013.
DOI : 10.1371/journal.pcbi.1002863.s031

A. Chao, R. L. Chazdon, R. K. Colwell, and T. Shen, Abundance-Based Similarity Indices and Their Estimation When There Are Unseen Species in Samples, Biometrics, vol.57, issue.2, pp.361-371, 2006.
DOI : 10.1111/j.0006-341X.2001.00743.x

E. Drezen, G. Rizk, R. Chikhi, C. Deltel, C. Lemaitre et al., GATB: Genome Assembly & Analysis Tool Box, Bioinformatics, vol.18, issue.20, pp.302959-2961, 2014.
DOI : 10.1101/gr.074492.107

URL : https://hal.archives-ouvertes.fr/hal-01088571

V. B. Dubinkina, D. S. Ischenko, V. I. Ulyantsev, A. V. Tyakht, and D. G. Alexeev, Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis, BMC Bioinformatics, vol.464, issue.7285, p.38, 2016.
DOI : 10.1038/nature08821

I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and Applications, Journal of Educational Measurement, vol.40, issue.3, 2013.
DOI : 10.4135/9781412985130

K. Elizabeth, . Costello, L. Christian, M. Lauber, N. Hamady et al., Bacterial community variation in human body habitats across space and time, Science, issue.5960, pp.3261694-1697, 2009.

M. Luis, . Rodriguez-r, T. Konstantinos, and . Konstantinidis, Nonpareil : a redundancybased approach to assess the level of coverage in metagenomic datasets, Bioinformatics, vol.30, issue.5, pp.629-635, 2013.

R. Robert and . Sokal, A statistical method for evaluating systematic relationships, University of Kansas Scientific Bulletin, vol.38, pp.1409-1438, 1958.

J. B. , H. Martiny, J. Brendan, . Bohannan, H. James et al., Microbial biogeography : putting microorganisms on the map, Nature reviews. Microbiology, vol.4, issue.2, p.102, 2006.

A. China, J. A. Hanson, C. Fuhrman, . Horner-devine, B. Jennifer et al., Beyond biogeographic patterns : processes shaping the microbial landscape, Nature reviews . Microbiology, vol.10, issue.7, p.497, 2012.

R. Jennifer, C. Brum, S. Ignacio-espinoza, G. Roux, . Doulcier et al., Patterns and ecological drivers of ocean viral communities, Science, issue.6237, p.3481261498, 2015.

S. Boutin, S. Y. Graeber, M. Weitnauer, J. Panitz, M. Stahl et al., Comparison of Microbiomes from Different Niches of Upper and Lower Airways in Children and Adolescents with Cystic Fibrosis, PLOS ONE, vol.7, issue.1, pp.1-19, 2015.
DOI : 10.1371/journal.pone.0116029.s004

A. Shade, S. E. Jones, J. G. Caporaso, J. Handelsman, R. Knight et al., Conditionally Rare Taxa Disproportionately Contribute to Temporal Changes in Microbial Diversity, mBio, vol.5, issue.4, pp.1371-1385, 2014.
DOI : 10.1128/mBio.01371-14

S. Genitsaris, S. Monchy, E. Viscogliosi, T. Sime-ngando, S. Ferreira et al., Seasonal variations of marine protist community structure based on taxon-specific traits using the eastern English Channel as a model coastal system, FEMS Microbiology Ecology, vol.42, issue.Suppl 1, pp.91-125, 2015.
DOI : 10.1016/S1385-1101(99)00029-5

S. Coveley, S. Mostafa, . Elshahed, H. Noha, and . Youssef, Response of the rare biosphere to environmental stressors in a highly diverse ecosystem (Zodletone spring, OK, USA), PeerJ, vol.78, p.1182, 2015.
DOI : 10.7717/peerj.1182/supp-21

V. Gomez-alvarez, S. Pfaller, J. G. Pressman, D. G. Wahman, and R. P. Revetta, Resilience of microbial communities in a simulated drinking water distribution system subjected to disturbances: role of conditionally rare taxa and potential implications for antibiotic-resistant bacteria, Environmental Science: Water Research & Technology, vol.6, issue.469, pp.645-657, 2016.
DOI : 10.3389/fmicb.2015.01216

F. David, . Robinson, R. Leslie, and . Foulds, Comparison of phylogenetic trees, Mathematical biosciences, vol.53, issue.12, pp.131-147, 1981.

K. Mary, J. Kuhner, and . Felsenstein, A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates, Molecular biology and evolution, vol.11, issue.3, pp.459-468, 1994.

Z. Iqbal, M. Caccamo, I. Turner, P. Flicek, and G. Mcvean, De novo assembly and genotyping of variants using colored de Bruijn graphs, Nature Genetics, vol.44, issue.2, pp.226-232, 2012.
DOI : 10.1016/0198-8859(91)90078-N

B. Laura, D. Dickson, G. Jiolle, I. Minard, S. Moltini-conclois et al., Carryover effects of larval exposure to different environmental bacteria drive adult trait variation in a mosquito vector, Science Advances, vol.3, issue.8, p.1700585, 2017.

R. Danovaro, M. Canals, M. Tangherlini, A. Dell-'anno, C. Gambi et al., A submarine volcanic eruption leads to a novel microbial habitat, Nature Ecology & Evolution, vol.9, issue.6, pp.41559-41576, 2017.
DOI : 10.1038/ncomms11257

M. Stephen, P. Rumble, . Lacroute, V. Adrian, M. Dalca et al., Shrimp : accurate mapping of short color-space reads, PLoS computational biology, vol.5, issue.5, p.1000386, 2009.

E. Aaron, . Darling, J. Todd, L. Treangen, C. Zhang et al., Procrastination leads to efficient filtration for local multiple alignment, International Workshop on Algorithms in Bioinformatics, pp.126-137, 2006.

T. Onodera and T. Shibuya, The Gapped Spectrum Kernel for Support Vector Machines, International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp.1-15, 2013.
DOI : 10.1007/978-3-642-39712-7_1

C. Leimeister, M. Boden, S. Horwege, S. Lindner, and B. Morgenstern, Fast alignment-free sequence comparison using spaced-word frequencies, Bioinformatics, vol.7, issue.14, pp.1991-1999, 2014.
DOI : 10.1186/1748-7188-7-10

URL : https://academic.oup.com/bioinformatics/article-pdf/30/14/1991/17142434/btu177.pdf

K. B?inda, M. Sykulski, and G. Kucherov, Spaced seeds improve k-mer-based metagenomic classification, Bioinformatics, issue.22, pp.313584-3592, 2015.

S. Girotto, M. Comin, and C. Pizzi, Fast Spaced Seed Hashing, 17th International Workshop on Algorithms in Bioinformatics of Leibniz International Proceedings in Informatics (LIPIcs) Schloss Dagstuhl?Leibniz-Zentrum fuer Informatik, pp.1-7, 2017.

J. Felsenstein, CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP, Evolution, vol.37, issue.4, pp.783-791, 1985.
DOI : 10.1111/j.1558-5646.1983.tb05533.x

H. Shimodaira, An Approximately Unbiased Test of Phylogenetic Tree Selection, Systematic Biology, vol.51, issue.3, pp.492-508, 2002.
DOI : 10.1080/10635150290069913

R. Harding and W. , Vegetation of the siskiyou mountains, oregon and california, Ecological monographs, vol.30, issue.3, pp.279-338, 1960.

R. Chikhi and P. Medvedev, Informed and automated k-mer size selection for genome assembly, Bioinformatics, vol.18, issue.5, pp.31-37, 2013.
DOI : 10.1101/gr.074492.107

URL : https://hal.archives-ouvertes.fr/hal-01477511

B. Solomon and C. Kingsford, Fast search of thousands of short-read sequencing experiments, Nature Biotechnology, vol.34, issue.3, p.300, 2016.
DOI : 10.1093/bioinformatics/btr011

C. Sun, S. Robert, R. Harris, P. Chikhi, and . Medvedev, Allsome sequence bloom trees, International Conference on Research in Computational Molecular Biology, pp.272-286, 2017.
DOI : 10.1101/090464

URL : https://hal.archives-ouvertes.fr/hal-01575350

B. Solomon and C. Kingsford, Improved search of large transcriptomic sequencing databases using split sequence bloom trees, International Conference on Research in Computational Molecular Biology, pp.257-271, 2017.

A. Limasset, G. Rizk, R. Chikhi, and P. Peterlongo, Fast and scalable minimal perfect hashing for massive key sets, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01566246

C. Marchet, L. Lecompte, A. Limasset, L. Bittner, and P. Peterlongo, A resource-frugal probabilistic dictionary and applications in bioinformatics, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01643162

S. De, Impact de la taille des k-mers sur les performances, p.53