Using Frequent Fixed or Variable-Length POS Ngrams or Skip-Grams for Blog Authorship Attribution

Abstract : Authorship attribution is the process of identifying the author of an unknown text from a finite set of known candidates. In recent years, it has become increasingly relevant in social networks, blogs, emails and forums where anonymous posts, bullying, and even threats are sometimes perpetrated. State-of-the-art systems for authorship attribution often combine a wide range of features to achieve high accuracy. Although many features have been proposed, it remains an important challenge to find new features and methods that can characterize each author and that can be used on non formal or short writings like blog content or emails. In this paper, we present a novel method for authorship attribution using frequent fixed or variable-length part-of-speech patterns (ngrams or skip-grams) as features to represent each author’s style. This method allows the system to automatically choose its most appropriate features as those sequences being used most frequently. An experimental evaluation on a collection of blog posts shows that the proposed approach is effective at discriminating between blog authors.
Type de document :
Communication dans un congrès
Lazaros Iliadis; Ilias Maglogiannis. 12th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2016, Thessaloniki, Greece. IFIP Advances in Information and Communication Technology, AICT-475, pp.63-74, 2016, Artificial Intelligence Applications and Innovations. 〈10.1007/978-3-319-44944-9_6〉
Liste complète des métadonnées

Littérature citée [45 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01557634
Contributeur : Hal Ifip <>
Soumis le : jeudi 6 juillet 2017 - 13:55:31
Dernière modification le : vendredi 1 décembre 2017 - 01:16:26

Fichier

 Accès restreint
Fichier visible le : 2019-01-01

Connectez-vous pour demander l'accès au fichier

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Yao Pokou, Philippe Fournier-Viger, Chadia Moghrabi. Using Frequent Fixed or Variable-Length POS Ngrams or Skip-Grams for Blog Authorship Attribution. Lazaros Iliadis; Ilias Maglogiannis. 12th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2016, Thessaloniki, Greece. IFIP Advances in Information and Communication Technology, AICT-475, pp.63-74, 2016, Artificial Intelligence Applications and Innovations. 〈10.1007/978-3-319-44944-9_6〉. 〈hal-01557634〉

Partager

Métriques

Consultations de la notice

31