Fast Content-Based File Type Identification

Irfan Ahmed; Kyung-Suk Lhee; Hyun-Jung Shin; Man-Pyo Hong

doi:10.1007/978-3-642-24212-0_5

Communication Dans Un Congrès Année : 2011

Fast Content-Based File Type Identification

(1) , (2) , (2) , (2)

1
2

Irfan Ahmed

Fonction : Auteur

Information Security Institute

Kyung-Suk Lhee

Fonction : Auteur

Ajou University

Hyun-Jung Shin

Fonction : Auteur

Ajou University

Man-Pyo Hong

Fonction : Auteur

Ajou University

Résumé

Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.

Mots clés

File type identification file content classification byte frequency

Domaines

Informatique [cs]

Fichier principal

978-3-642-24212-0_5_Chapter.pdf (740.39 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01569553

Soumis le : jeudi 27 juillet 2017-08:22:27

Dernière modification le : jeudi 5 mars 2020-16:46:42

Dates et versions

hal-01569553 , version 1 (27-07-2017)

Licence

Paternité

Identifiants

HAL Id : hal-01569553 , version 1
DOI : 10.1007/978-3-642-24212-0_5

Citer

Irfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin, Man-Pyo Hong. Fast Content-Based File Type Identification. 7th Digital Forensics (DF), Jan 2011, Orlando, FL, United States. pp.65-75, ⟨10.1007/978-3-642-24212-0_5⟩. ⟨hal-01569553⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-AICT IFIP-TC IFIP-WG IFIP-TC11 IFIP-DF IFIP-WG11-9 IFIP-AICT-361

68 Consultations

310 Téléchargements

Fast Content-Based File Type Identification

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager