Analogies minus analogy test: measuring regularities in word embeddings - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Analogies minus analogy test: measuring regularities in word embeddings

Résumé

Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar directions between pairs of words drawn from different broad classes, such as France-London, China-Ottawa,. . .) and pairing consistency (the existence of a regular transformation between correctly-matched pairs such as France:Paris::China:Beijing). We show that, while the standard analogy test is flawed, several popular word embeddings do nevertheless encode linguistic regularities.
Fichier principal
Vignette du fichier
2010.03446.pdf (825.01 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03070260 , version 1 (15-12-2020)

Identifiants

Citer

Louis Fournier, Emmanuel Dupoux, Ewan Dunbar. Analogies minus analogy test: measuring regularities in word embeddings. CoNLL 2020 - 24th Conference on Computational Natural Language Learning, Nov 2020, Virtual, France. ⟨hal-03070260⟩
42 Consultations
60 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More