AllSome Sequence Bloom Trees

Chen Sun 1 Robert S. Harris 1 Rayan Chikhi 2 Paul Medvedev 1
2 BONSAI - Bioinformatics and Sequence Analysis
Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189, CNRS - Centre National de la Recherche Scientifique
Abstract : The ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this paper, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39–85%, with a price of up to 3x memory consumption during queries. Notably, it can query a batch of 198,074 queries in under 8 h (compared to around two days previously) and a whole set of k-mers from a sequencing experiment (about 27 mil k-mers) in under 11 min.
Type de document :
Communication dans un congrès
RECOMB 2017 - 21st Annual International Conference on Research in Computational Molecular Biology, May 2017, Hong Kong, China. 2017, 〈10.1007/978-3-319-56970-3_17〉
Liste complète des métadonnées

Littérature citée [39 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01575350
Contributeur : Rayan Chikhi <>
Soumis le : vendredi 18 août 2017 - 22:28:16
Dernière modification le : mercredi 25 avril 2018 - 15:43:08

Fichier

allsome.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Chen Sun, Robert S. Harris, Rayan Chikhi, Paul Medvedev. AllSome Sequence Bloom Trees. RECOMB 2017 - 21st Annual International Conference on Research in Computational Molecular Biology, May 2017, Hong Kong, China. 2017, 〈10.1007/978-3-319-56970-3_17〉. 〈hal-01575350〉

Partager

Métriques

Consultations de la notice

296

Téléchargements de fichiers

24