Skip to Main content Skip to Navigation
Conference papers

On the Correlation of Word Embedding Evaluation Metrics

Abstract : Word embeddings intervene in a wide range of natural language processing tasks. These geometrical representations are easy to manipulate for automatic systems. Therefore, they quickly invaded all areas of language processing. While they surpass all predecessors, it is still not straightforward why and how they do so. In this article, we propose to investigate all kind of evaluation metrics on various datasets in order to discover how they correlate with each other. Those correlations lead to 1) a fast solution to select the best word embeddings among many others, 2) a new criterion that may improve the current state of static Euclidean word embeddings, and 3) a way to create a set of complementary datasets, i.e. each dataset quantifies a different aspect of word embeddings.
Complete list of metadatas

Cited literature [42 references]  Display  Hide  Download

https://hal.inria.fr/hal-02919006
Contributor : François Torregrossa <>
Submitted on : Monday, August 24, 2020 - 9:14:33 AM
Last modification on : Tuesday, September 15, 2020 - 10:32:49 AM

File

2020.lrec-1.589.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02919006, version 1

Citation

François Torregrossa, Vincent Claveau, Nihel Kooli, Guillaume Gravier, Robin Allesiardo. On the Correlation of Word Embedding Evaluation Metrics. LREC 2020 - 12th Conference on Language Resources and Evaluation, May 2020, Marseille, France. pp.4789 - 4797. ⟨hal-02919006⟩

Share

Metrics

Record views

23

Files downloads

100