Similarity Hashing Based on Levenshtein Distances

Abstract : It is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file.This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as +sdhash+.
Type de document :
Communication dans un congrès
Gilbert Peterson; Sujeet Shenoi. 10th IFIP International Conference on Digital Forensics (DF), Jan 2014, Vienna, Austria. Springer, IFIP Advances in Information and Communication Technology, AICT-433, pp.133-147, 2014, Advances in Digital Forensics X. 〈10.1007/978-3-662-44952-3_10〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01393767
Contributeur : Hal Ifip <>
Soumis le : mardi 8 novembre 2016 - 10:48:06
Dernière modification le : vendredi 1 décembre 2017 - 01:17:02
Document(s) archivé(s) le : mardi 14 mars 2017 - 23:33:50

Fichier

978-3-662-44952-3_10_Chapter.p...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Frank Breitinger, Georg Ziroff, Steffen Lange, Harald Baier. Similarity Hashing Based on Levenshtein Distances. Gilbert Peterson; Sujeet Shenoi. 10th IFIP International Conference on Digital Forensics (DF), Jan 2014, Vienna, Austria. Springer, IFIP Advances in Information and Communication Technology, AICT-433, pp.133-147, 2014, Advances in Digital Forensics X. 〈10.1007/978-3-662-44952-3_10〉. 〈hal-01393767〉

Partager

Métriques

Consultations de la notice

91

Téléchargements de fichiers

236