Extracting an Etymological Database from Wiktionary

Abstract : Electronic lexical resources almost never contain etymological information. The availability of such information, if properly formalised, could open up the possibility of developing automatic tools targeted towards historical and comparative linguistics, as well as significantly improving the automatic processing of ancient languages. We describe here the process we implemented for extracting etymological data from the etymological notices found in Wiktionary. We have produced a multilingual database of nearly one million lexemes and a database of more than half a million etymological relations between lexemes.
Type de document :
Communication dans un congrès
Electronic Lexicography in the 21st century (eLex 2017), Sep 2017, Leiden, Netherlands. pp.716-728, 〈https://elex.link/elex2017/〉
Liste complète des métadonnées

Littérature citée [3 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01592061
Contributeur : Benoît Sagot <>
Soumis le : vendredi 22 septembre 2017 - 15:11:19
Dernière modification le : samedi 9 juin 2018 - 10:30:02
Document(s) archivé(s) le : samedi 23 décembre 2017 - 13:32:31

Fichier

paper44.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01592061, version 1

Collections

Citation

Benoît Sagot. Extracting an Etymological Database from Wiktionary. Electronic Lexicography in the 21st century (eLex 2017), Sep 2017, Leiden, Netherlands. pp.716-728, 〈https://elex.link/elex2017/〉. 〈hal-01592061〉

Partager

Métriques

Consultations de la notice

336

Téléchargements de fichiers

195