Approximate Hashing for Bioinformatics

Guy Arbitman; Shmuel T Klein; Pierre Peterlongo; Dana Shapira

Communication Dans Un Congrès Année : 2021

Approximate Hashing for Bioinformatics

(1) , (1) , (2) , (3)

1
2
3

Guy Arbitman

Fonction : Auteur

Department of Computer Science [Bar Ilan]

Shmuel T Klein

Fonction : Auteur

Department of Computer Science [Bar Ilan]

Pierre Peterlongo

Fonction : Auteur
PersonId : 171998
IdHAL : pierre-peterlongo
ORCID : 0000-0003-0776-6407
IdRef : 12482062X

Scalable, Optimized and Parallel Algorithms for Genomics

Dana Shapira

Fonction : Auteur
PersonId : 1093271

Ariel University Center

Résumé

The paper extends ideas from data compression by deduplication to the Bioinformatic field. The specific problems on which we show our approach to be useful are the clustering of a large set of DNA strings and the search for approximate matches of long substrings, both based on the design of what we call an approximate hashing function. The outcome of the new procedure is very similar to the clustering and search results obtained by accurate tools, but in much less time and with less required memory.

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

CIAA_2021_paper_11.pdf (704.33 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Pierre Peterlongo : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03219482

Soumis le : jeudi 6 mai 2021-14:21:27

Dernière modification le : vendredi 24 mars 2023-14:53:21

Archivage à long terme le : samedi 7 août 2021-19:02:12

Dates et versions

hal-03219482 , version 1 (06-05-2021)

Identifiants

HAL Id : hal-03219482 , version 1

Citer

Guy Arbitman, Shmuel T Klein, Pierre Peterlongo, Dana Shapira. Approximate Hashing for Bioinformatics. CIAA 2021 - 25th International Conference on Implementation and Application of Automata, Jul 2021, Bremen, Germany. pp.1-12. ⟨hal-03219482⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

92 Consultations

301 Téléchargements

Approximate Hashing for Bioinformatics

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager