Contextualized Diachronic Word Representations - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Contextualized Diachronic Word Representations

Résumé

Diachronic word embeddings play a key role in capturing interesting patterns about how language evolves over time. Most of the existing work focuses on studying corpora spanning across several decades, which is understandably still not a possibility when working on social media-based user-generated content. In this work, we address the problem of studying semantic changes in a large Twitter corpus collected over five years, a much shorter period than what is usually the norm in di-achronic studies. We devise a novel attentional model, based on Bernoulli word embeddings, that are conditioned on contextual extra-linguistic (social) features such as network, spatial and socioeconomic variables, which are associated with Twitter users, as well as topic-based features. We posit that these social features provide an inductive bias that helps our model to overcome the narrow time-span regime problem. Our extensive experiments reveal that our proposed model is able to capture subtle semantic shifts without being biased towards frequency cues and also works well when certain con-textual features are absent. Our model fits the data better than current state-of-the-art dynamic word embedding models and therefore is a promising tool to study diachronic semantic changes over small time periods.
Fichier principal
Vignette du fichier
main.pdf (1.56 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02194763 , version 1 (25-07-2019)

Identifiants

  • HAL Id : hal-02194763 , version 1

Citer

Ganesh Jawahar, Djamé Seddah. Contextualized Diachronic Word Representations. 1st International Workshop on Computational Approaches to Historical Language Change 2019 (colocated with ACL 2019), Aug 2019, Florence, Italy. ⟨hal-02194763⟩

Collections

INRIA INRIA2 ANR
359 Consultations
351 Téléchargements

Partager

Gmail Facebook X LinkedIn More