Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Tulika Bose; Nikolaos Aletras; Irina Illina; Dominique Fohr

Communication Dans Un Congrès Année : 2022

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

(1) , (2) , (1) , (1)

1
2

Tulika Bose

Fonction : Auteur
PersonId : 1096690

Speech Modeling for Facilitating Oral-Based Communication

Nikolaos Aletras

Fonction : Auteur
PersonId : 1136302

University of Sheffield [Sheffield]

Irina Illina

Fonction : Auteur
PersonId : 15663
IdHAL : irina-illina
IdRef : 120731746

Speech Modeling for Facilitating Oral-Based Communication

Dominique Fohr

Fonction : Auteur
PersonId : 15652
IdHAL : dominique-fohr
IdRef : 031092942

Speech Modeling for Facilitating Oral-Based Communication

Résumé

State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation.

Domaines

Traitement du texte et du document Intelligence artificielle [cs.AI] Informatique [cs] Apprentissage [cs.LG]

Fichier principal

1371_file_Paper.pdf (342.35 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Tulika Bose : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03815708

Soumis le : vendredi 14 octobre 2022-19:02:39

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : dimanche 15 janvier 2023-19:41:50

Dates et versions

hal-03815708 , version 1 (14-10-2022)

Identifiants

HAL Id : hal-03815708 , version 1

Citer

Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr. Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection. COLING 2022 - Proceedings of the 29th International Conference on Computational Linguistics, Oct 2022, Gyeongju, South Korea. ⟨hal-03815708⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD SILECS

35 Consultations

50 Téléchargements

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager