Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB 2.0 - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB 2.0

Résumé

Diachronic lexical information was mostly used in its natural field, historical linguistics, until recently, when promising but not yet conclusive applications to low resource languages machine translation started extending its usage to NLP. There is therefore a new need for fine-grained, large-coverage and accurate etymological lexical resources. In this paper, we propose a set of guidelines to generate such resources, for each step of the life-cycle of an etymological lexicon: creation, update, evaluation, dissemination, and exploitation. To illustrate the guidelines, we introduce EtymDB 2.0, an etymological database automatically generated from the Wiktionary, which contains 1.8 million lexemes, linked by more than 700,000 fine-grained etymological relations, across 2,536 living and dead languages. We also introduce use cases for which EtymDB 2.0 could represent a key resource, such as phylogenetic tree generation, low resource machine translation and medieval languages study.
Fichier principal
Vignette du fichier
Updating_and_correcting_an_etymological_database___LREC-4.pdf (231.25 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02678100 , version 1 (31-05-2020)

Identifiants

  • HAL Id : hal-02678100 , version 1

Citer

Clémentine Fourrier, Benoît Sagot. Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB 2.0. LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France. ⟨hal-02678100⟩
175 Consultations
238 Téléchargements

Partager

Gmail Facebook X LinkedIn More