HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Generalisability of Topic Models in Cross-corpora Abusive Language Detection

Tulika Bose 1 Irina Illina 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Rapidly changing social media content calls for robust and generalisable abuse detection models. However, the state-of-the-art supervised models display degraded performance when they are evaluated on abusive comments that differ from the training corpus. We investigate if the performance of supervised models for cross-corpora abuse detection can be improved by incorporating additional information from topic models, as the latter can infer the latent topic mixtures from unseen samples. In particular, we combine topical information with representations from a model tuned for classifying abusive comments. Our performance analysis reveals that topic models are able to capture abuse-related topics that can transfer across corpora, and result in improved generalisability.
Document type :
Conference papers
Complete list of metadata

Contributor : Tulika Bose Connect in order to contact the contributor
Submitted on : Thursday, April 29, 2021 - 2:01:07 PM
Last modification on : Saturday, February 5, 2022 - 3:08:38 AM
Long-term archiving on: : Friday, July 30, 2021 - 6:41:23 PM


Files produced by the author(s)


  • HAL Id : hal-03212196, version 1


Tulika Bose, Irina Illina, Dominique Fohr. Generalisability of Topic Models in Cross-corpora Abusive Language Detection. NLP4IF 2021 - Workshop Censorship, Disinformation, and Propaganda, Jun 2021, Mexico city/Virtual, Mexico. ⟨hal-03212196⟩



Record views


Files downloads