HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Approximate Hashing for Bioinformatics

Abstract : The paper extends ideas from data compression by deduplication to the Bioinformatic field. The specific problems on which we show our approach to be useful are the clustering of a large set of DNA strings and the search for approximate matches of long substrings, both based on the design of what we call an approximate hashing function. The outcome of the new procedure is very similar to the clustering and search results obtained by accurate tools, but in much less time and with less required memory.
Document type :
Conference papers
Complete list of metadata

Contributor : Pierre Peterlongo Connect in order to contact the contributor
Submitted on : Thursday, May 6, 2021 - 2:21:27 PM
Last modification on : Monday, April 4, 2022 - 9:28:27 AM
Long-term archiving on: : Saturday, August 7, 2021 - 7:02:12 PM


Files produced by the author(s)


  • HAL Id : hal-03219482, version 1


Guy Arbitman, Shmuel Klein, Pierre Peterlongo, Dana Shapira. Approximate Hashing for Bioinformatics. CIAA 2021 - 25th International Conference on Implementation and Application of Automata, Jul 2021, Bremen, Germany. pp.1-12. ⟨hal-03219482⟩



Record views


Files downloads