Skip to Main content Skip to Navigation
Conference papers

Fast Content-Based File Type Identification

Abstract : Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.
Document type :
Conference papers
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Hal Ifip <>
Submitted on : Thursday, July 27, 2017 - 8:22:27 AM
Last modification on : Thursday, March 5, 2020 - 4:46:42 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Irfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin, Man-Pyo Hong. Fast Content-Based File Type Identification. 7th Digital Forensics (DF), Jan 2011, Orlando, FL, United States. pp.65-75, ⟨10.1007/978-3-642-24212-0_5⟩. ⟨hal-01569553⟩



Record views


Files downloads