Humans vs. Machines in Malware Classification - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Humans vs. Machines in Malware Classification

Résumé

Today, the classification of a file as either benign or malicious is performed by a combination of deterministic indicators (such as antivirus rules), Machine Learning classifiers, and, more importantly, the judgment of human experts. However, to compare the difference between human and machine intelligence in malware analysis, it is first necessary to understand how human subjects approach malware classification. In this direction, our work presents the first experimental study designed to capture which 'features' of a suspicious program (e.g., static properties or runtime behaviors) are prioritized for malware classification according to humans and machines intelligence. For this purpose, we created a malware classification game where 110 human players worldwide and with different seniority levels (72 novices and 38 experts) have competed to classify the highest number of unknown samples based on detailed sandbox reports. Surprisingly, we discovered that both experts and novices base their decisions on approximately the same features, even if there are clear differences between the two expertise classes. Furthermore, we implemented two state-of-the-art Machine Learning models for malware classification and evaluated their performances on the same set of samples. The comparative analysis of the results unveiled a common set of features preferred by both Machine Learning models and helped better understand the difference in the feature extraction. This work reflects the difference in the decision-making process of humans and computer algorithms and the different ways they extract information from the same data. Its findings serve multiple purposes, from training better malware analysts to improving feature encoding.
Fichier principal
Vignette du fichier
sec23summer_241-aonzo-prepub.pdf (311.24 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04321950 , version 1 (04-12-2023)

Licence

Domaine public

Identifiants

  • HAL Id : hal-04321950 , version 1

Citer

Simone Aonzo, Yufei Han, Alessandro Mantovani, Davide Balzarotti. Humans vs. Machines in Malware Classification. USENIX Security 2023 - 32nd Usenix Security Symposium, Aug 2023, Anaheim (CA), United States. ⟨hal-04321950⟩
35 Consultations
15 Téléchargements

Partager

Gmail Facebook X LinkedIn More