Abstract : Stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the author of the document is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional classification methods are ineffective. This paper proposes the “classify-verify” method that augments classification with a binary verification step evaluated on stylometric datasets. This method, which can be generalized to any domain, significantly outperforms traditional classifiers in open-world settings and yields an F1-score of 0.87, comparable to traditional classifiers in closed-world settings. Moreover, the method successfully detects adversarial documents where authors deliberately change their styles, a problem for which closed-world classifiers fail.
https://hal.inria.fr/hal-01393771 Contributor : Hal IfipConnect in order to contact the contributor Submitted on : Tuesday, November 8, 2016 - 10:48:53 AM Last modification on : Thursday, March 5, 2020 - 4:46:29 PM Long-term archiving on: : Tuesday, March 14, 2017 - 10:15:50 PM