Skip to Main content Skip to Navigation
Conference papers

Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB 2.0

Abstract : Diachronic lexical information was mostly used in its natural field, historical linguistics, until recently, when promising but not yet conclusive applications to low resource languages machine translation started extending its usage to NLP. There is therefore a new need for fine-grained, large-coverage and accurate etymological lexical resources. In this paper, we propose a set of guidelines to generate such resources, for each step of the life-cycle of an etymological lexicon: creation, update, evaluation, dissemination, and exploitation. To illustrate the guidelines, we introduce EtymDB 2.0, an etymological database automatically generated from the Wiktionary, which contains 1.8 million lexemes, linked by more than 700,000 fine-grained etymological relations, across 2,536 living and dead languages. We also introduce use cases for which EtymDB 2.0 could represent a key resource, such as phylogenetic tree generation, low resource machine translation and medieval languages study.
Complete list of metadata

Cited literature [33 references]  Display  Hide  Download

https://hal.inria.fr/hal-02678100
Contributor : Benoît Sagot <>
Submitted on : Sunday, May 31, 2020 - 8:07:27 PM
Last modification on : Tuesday, June 8, 2021 - 4:04:01 PM

File

Updating_and_correcting_an_ety...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02678100, version 1

Collections

Citation

Clémentine Fourrier, Benoît Sagot. Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB 2.0. LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France. ⟨hal-02678100⟩

Share

Metrics

Record views

82

Files downloads

204