Humans vs. Machines in Malware Classification

Simone Aonzo; Yufei Han; Alessandro Mantovani; Davide Balzarotti

Communication Dans Un Congrès Année : 2023

Humans vs. Machines in Malware Classification

(1) , (2) , (1) , (1)

1
2

Simone Aonzo

Fonction : Auteur

Eurecom [Sophia Antipolis]

Yufei Han

Fonction : Auteur

Confidentialité, Intégrité, Disponibilité et Répartition

Alessandro Mantovani

Fonction : Auteur

Eurecom [Sophia Antipolis]

Davide Balzarotti

Fonction : Auteur

Eurecom [Sophia Antipolis]

Résumé

Today, the classification of a file as either benign or malicious is performed by a combination of deterministic indicators (such as antivirus rules), Machine Learning classifiers, and, more importantly, the judgment of human experts. However, to compare the difference between human and machine intelligence in malware analysis, it is first necessary to understand how human subjects approach malware classification. In this direction, our work presents the first experimental study designed to capture which 'features' of a suspicious program (e.g., static properties or runtime behaviors) are prioritized for malware classification according to humans and machines intelligence. For this purpose, we created a malware classification game where 110 human players worldwide and with different seniority levels (72 novices and 38 experts) have competed to classify the highest number of unknown samples based on detailed sandbox reports. Surprisingly, we discovered that both experts and novices base their decisions on approximately the same features, even if there are clear differences between the two expertise classes. Furthermore, we implemented two state-of-the-art Machine Learning models for malware classification and evaluated their performances on the same set of samples. The comparative analysis of the results unveiled a common set of features preferred by both Machine Learning models and helped better understand the difference in the feature extraction. This work reflects the difference in the decision-making process of humans and computer algorithms and the different ways they extract information from the same data. Its findings serve multiple purposes, from training better malware analysts to improving feature encoding.

Domaines

Informatique [cs] Apprentissage [cs.LG] Machine Learning [stat.ML]

Fichier principal

sec23summer_241-aonzo-prepub.pdf (311.24 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Yufei Han : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04321950

Soumis le : lundi 4 décembre 2023-17:05:02

Dernière modification le : vendredi 26 janvier 2024-08:35:36

Dates et versions

hal-04321950 , version 1 (04-12-2023)

Licence

Domaine public

Identifiants

HAL Id : hal-04321950 , version 1

Citer

Simone Aonzo, Yufei Han, Alessandro Mantovani, Davide Balzarotti. Humans vs. Machines in Malware Classification. USENIX Security 2023 - 32nd Usenix Security Symposium, Aug 2023, Anaheim (CA), United States. ⟨hal-04321950⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA EURECOM SUP_CIDRE CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR UR1-MATH-NUM CYBERSCHOOL PARTENARIATS-APP CYBERSCURITE

35 Consultations

15 Téléchargements

Humans vs. Machines in Malware Classification

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager