Skip to Main content Skip to Navigation
Conference papers

Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations

Abstract : In the post-genomic era there has been an explosion in the amount of genomic data available and the primary research problems have moved from being able to produce interesting biological data to being able to efficiently process and store this information. In this paper we present efficient data structures and algorithms for the High Similarity Sequencing Problem. In the High Similarity Sequencing Problem we are given the sequences S0, S1, …, Sk where Sj = $e_{j_1} I_{\sigma_1}e_{j_2} I_{\sigma_2} e_{j_3} I_{\sigma_3}, \dots,e_{j_\ell} I_{\sigma_\ell}$ and must perform pattern matching on the set of sequences. In this paper we present time and memory efficient datastructures by exploiting their extensive similarity, our solution leads to a query time of $O(m + vk \log \ell + \frac{m occ_v v}{w} + \frac{PSC(p)m}{w})$ with a memory usage of O(N logN + vk logvk).
Document type :
Conference papers
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01523079
Contributor : Hal Ifip <>
Submitted on : Tuesday, May 16, 2017 - 9:17:15 AM
Last modification on : Thursday, March 5, 2020 - 5:41:45 PM
Long-term archiving on: : Friday, August 18, 2017 - 12:30:51 AM

File

978-3-642-33412-2_60_Chapter.p...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Ali Alatabbi, Carl Barton, Costas Iliopoulos, Laurent Mouchard. Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations. 8th International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2012, Halkidiki, Greece. pp.584-592, ⟨10.1007/978-3-642-33412-2_60⟩. ⟨hal-01523079⟩

Share

Metrics

Record views

132

Files downloads

199