Analogies minus analogy test: measuring regularities in word embeddings

Louis Fournier; Emmanuel Dupoux; Ewan Dunbar

Communication Dans Un Congrès Année : 2020

Analogies minus analogy test: measuring regularities in word embeddings

(1, 2) , (2, 1) , (3)

1
2
3

Louis Fournier

Fonction : Auteur
PersonId : 1259950
IdHAL : louis-fournier
ORCID : 0009-0007-9912-8061

Apprentissage machine et développement cognitif

Laboratoire de sciences cognitives et psycholinguistique

Emmanuel Dupoux

Fonction : Auteur
PersonId : 857216

Laboratoire de sciences cognitives et psycholinguistique

Apprentissage machine et développement cognitif

Ewan Dunbar

Fonction : Auteur
PersonId : 1078898

Department of Linguistics [Montréal]

Résumé

Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar directions between pairs of words drawn from different broad classes, such as France-London, China-Ottawa,. . .) and pairing consistency (the existence of a regular transformation between correctly-matched pairs such as France:Paris::China:Beijing). We show that, while the standard analogy test is flawed, several popular word embeddings do nevertheless encode linguistic regularities.

Domaines

Informatique et langage [cs.CL] Intelligence artificielle [cs.AI]

Fichier principal

2010.03446.pdf (825.01 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Dupoux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03070260

Soumis le : mardi 15 décembre 2020-17:34:58

Dernière modification le : lundi 18 mars 2024-10:24:06

Archivage à long terme le : mardi 16 mars 2021-20:13:12

Dates et versions

hal-03070260 , version 1 (15-12-2020)

Identifiants

HAL Id : hal-03070260 , version 1
ARXIV : 2010.03446

Citer

Louis Fournier, Emmanuel Dupoux, Ewan Dunbar. Analogies minus analogy test: measuring regularities in word embeddings. CoNLL 2020 - 24th Conference on Computational Natural Language Learning, Nov 2020, Virtual, France. ⟨hal-03070260⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA EHESS LSCP DEC INRIA2 PSL ANR PRAIRIE-IA

42 Consultations

60 Téléchargements

Analogies minus analogy test: measuring regularities in word embeddings

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager