A resource-frugal probabilistic dictionary and applications in (meta)genomics

Camille Marchet 1 Antoine Limasset 1 Lucie Bittner 2 Pierre Peterlongo 1
1 GenScale - Scalable, Optimized and Parallel Algorithms for Genomics
IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE, Inria Rennes – Bretagne Atlantique
Abstract : Genomic and metagenomic fields, generating huge sets of short genomic sequences, brought their own share of high performance problems. To extract relevant pieces of information from the huge data sets generated by current sequencing techniques, one must rely on extremely scalable methods and solutions. Indexing billions of objects is a task considered too expensive while being a fundamental need in this field. In this paper we propose a straightforward indexing structure that scales to billions of element and we propose two direct applications in genomics and metagenomics. We show that our proposal solves problem instances for which no other known solution scales-up. We believe that many tools and applications could benefit from either the fundamental data structure we provide or from the applications developed from this structure.
Type de document :
Pré-publication, Document de travail
2016
Liste complète des métadonnées

Littérature citée [33 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01322440
Contributeur : Pierre Peterlongo <>
Soumis le : vendredi 27 mai 2016 - 11:00:32
Dernière modification le : mardi 16 janvier 2018 - 15:54:20
Document(s) archivé(s) le : dimanche 28 août 2016 - 10:35:23

Fichiers

short_read_connectors.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01322440, version 1
  • ARXIV : 1605.08319

Citation

Camille Marchet, Antoine Limasset, Lucie Bittner, Pierre Peterlongo. A resource-frugal probabilistic dictionary and applications in (meta)genomics. 2016. 〈hal-01322440〉

Partager

Métriques

Consultations de la notice

348

Téléchargements de fichiers

79