Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection

Résumé

Hate speech detection in online platforms has been widely studied in the past. Most of these works were conducted in English and a few rich-resource languages. Recent approaches tailored for low-resource languages have explored the interests of zero-shot cross-lingual transfer learning models in resource-scarce scenarios. However, languages variations between geolects such as American English and British English, Latin-American Spanish, and European Spanish is still a problem for NLP models that often relies on (latent) lexical information for their classification tasks. More importantly, the cultural aspect, crucial for hate speech detection, is often overlooked. In this work, we present the results of a thorough analysis of hate speech detection models performance on different variants of Spanish, including a new hate speech toward immigrants Twitter data set we built to cover these variants. Using mBERT and Beto, a monolingual Spanish Bert-based language model, as the basis of our transfer learning architecture, our results indicate that hate speech detection models for a given Spanish variant are affected when different variations of such language are not considered. Hate speech expressions could vary from region to region where the same language is spoken.
Fichier principal
Vignette du fichier
2023.vardial-1.1.pdf (537.37 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04243810 , version 1 (16-10-2023)

Licence

Paternité

Identifiants

Citer

Galo Castillo-López, Arij Riabi, Djamé Seddah. Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection. Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), May 2023, Dubrovnik, Croatia. pp.1-13, ⟨10.18653/v1/2023.vardial-1.1⟩. ⟨hal-04243810⟩
19 Consultations
16 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More