Perfect Hashing Structures for Parallel Similarity Searches

Tuan Tu Tran 1 Mathieu Giraud 2, 3 Jean-Stéphane Varré 3, 2
2 BONSAI - Bioinformatics and Sequence Analysis
Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189, CNRS - Centre National de la Recherche Scientifique
Abstract : Seed-based heuristics have proved to be efficient for studying similarity between genetic databases with billions of base pairs. This paper focuses on algorithms and data structures for the filtering phase in seed-based heuristics, with an emphasis on efficient parallel GPU/manycores implementa- tion. We propose a 2-stage index structure which is based on neighborhood indexing and perfect hashing techniques. This structure performs a filtering phase over the neighborhood regions around the seeds in constant time and avoid as much as possible random memory accesses and branch divergences. Moreover, it fits particularly well on parallel SIMD processors, because it requires intensive but homogeneous computational operations. Using this data structure, we developed a fast and sensitive OpenCL prototype read mapper.
Complete list of metadatas

https://hal.inria.fr/hal-01153893
Contributor : Mathieu Giraud <>
Submitted on : Wednesday, May 20, 2015 - 4:02:26 PM
Last modification on : Friday, March 22, 2019 - 1:33:46 AM

Links full text

Identifiers

Citation

Tuan Tu Tran, Mathieu Giraud, Jean-Stéphane Varré. Perfect Hashing Structures for Parallel Similarity Searches. International Workshop on High Performance Computational Biology (HiCOMB 2015) / International Parallel and Distributed Processing Symposium (IPDPS 2015), 2015, Hyderabad, India. pp.332-341, ⟨10.1109/IPDPSW.2015.105⟩. ⟨hal-01153893⟩

Share

Metrics

Record views

539