Fast Content-Based File Type Identification

Abstract : Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.
Type de document :
Communication dans un congrès
Gilbert Peterson; Sujeet Shenoi. 7th Digital Forensics (DF), Jan 2011, Orlando, FL, United States. Springer, IFIP Advances in Information and Communication Technology, AICT-361, pp.65-75, 2011, Advances in Digital Forensics VII. 〈10.1007/978-3-642-24212-0_5〉
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01569553
Contributeur : Hal Ifip <>
Soumis le : jeudi 27 juillet 2017 - 08:22:27
Dernière modification le : vendredi 1 décembre 2017 - 01:16:43

Fichier

978-3-642-24212-0_5_Chapter.pd...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Irfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin, Man-Pyo Hong. Fast Content-Based File Type Identification. Gilbert Peterson; Sujeet Shenoi. 7th Digital Forensics (DF), Jan 2011, Orlando, FL, United States. Springer, IFIP Advances in Information and Communication Technology, AICT-361, pp.65-75, 2011, Advances in Digital Forensics VII. 〈10.1007/978-3-642-24212-0_5〉. 〈hal-01569553〉

Partager

Métriques

Consultations de la notice

30

Téléchargements de fichiers

46