HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Unsupervised Domain Adaptation in Cross-corpora Abusive Language Detection

Tulika Bose 1 Irina Illina 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : The state-of-the-art abusive language detection models report great in-corpus performance, but underperform when evaluated on abusive comments that differ from the training scenario. As human annotation involves substantial time and effort, models that can adapt to newly collected comments can prove to be useful. In this paper, we investigate the effectiveness of several Unsupervised Domain Adaptation (UDA) approaches for the task of cross-corpora abusive language detection. In comparison, we adapt a variant of the BERT model, trained on large-scale abusive comments, using Masked Language Model (MLM) fine-tuning. Our evaluation shows that the UDA approaches result in sub-optimal performance, while the MLM fine-tuning does better in the cross-corpora setting. Detailed analysis reveals the limitations of the UDA approaches and emphasizes the need to build efficient adaptation methods for this task.
Document type :
Conference papers
Complete list of metadata

Contributor : Tulika Bose Connect in order to contact the contributor
Submitted on : Wednesday, April 21, 2021 - 4:36:07 PM
Last modification on : Saturday, October 16, 2021 - 11:26:10 AM
Long-term archiving on: : Thursday, July 22, 2021 - 7:21:20 PM


Files produced by the author(s)


  • HAL Id : hal-03204605, version 1


Tulika Bose, Irina Illina, Dominique Fohr. Unsupervised Domain Adaptation in Cross-corpora Abusive Language Detection. SocialNLP 2021 - The 9th International Workshop on Natural Language Processing for Social Media, Jun 2021, Virtual, France. ⟨hal-03204605⟩



Record views


Files downloads