Fast Content-Based File Type Identification - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Fast Content-Based File Type Identification

Résumé

Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.
Fichier principal
Vignette du fichier
978-3-642-24212-0_5_Chapter.pdf (740.39 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01569553 , version 1 (27-07-2017)

Licence

Paternité

Identifiants

Citer

Irfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin, Man-Pyo Hong. Fast Content-Based File Type Identification. 7th Digital Forensics (DF), Jan 2011, Orlando, FL, United States. pp.65-75, ⟨10.1007/978-3-642-24212-0_5⟩. ⟨hal-01569553⟩
68 Consultations
310 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More