Fast Content-Based File Type Identification - Archive ouverte HAL Access content directly
Conference Papers Year : 2011

Fast Content-Based File Type Identification

(1) , (2) , (2) , (2)
1
2

Abstract

Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.
Fichier principal
Vignette du fichier
978-3-642-24212-0_5_Chapter.pdf (740.39 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01569553 , version 1 (27-07-2017)

Licence

Attribution - CC BY 4.0

Identifiers

Cite

Irfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin, Man-Pyo Hong. Fast Content-Based File Type Identification. 7th Digital Forensics (DF), Jan 2011, Orlando, FL, United States. pp.65-75, ⟨10.1007/978-3-642-24212-0_5⟩. ⟨hal-01569553⟩
57 View
263 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More