Skip to Main content Skip to Navigation
Conference papers

Extracting an Etymological Database from Wiktionary

Abstract : Electronic lexical resources almost never contain etymological information. The availability of such information, if properly formalised, could open up the possibility of developing automatic tools targeted towards historical and comparative linguistics, as well as significantly improving the automatic processing of ancient languages. We describe here the process we implemented for extracting etymological data from the etymological notices found in Wiktionary. We have produced a multilingual database of nearly one million lexemes and a database of more than half a million etymological relations between lexemes.
Complete list of metadata

Cited literature [3 references]  Display  Hide  Download
Contributor : Benoît Sagot Connect in order to contact the contributor
Submitted on : Friday, September 22, 2017 - 3:11:19 PM
Last modification on : Wednesday, June 8, 2022 - 12:50:06 PM
Long-term archiving on: : Saturday, December 23, 2017 - 1:32:31 PM


Files produced by the author(s)


  • HAL Id : hal-01592061, version 1



Benoît Sagot. Extracting an Etymological Database from Wiktionary. Electronic Lexicography in the 21st century (eLex 2017), Sep 2017, Leiden, Netherlands. pp.716-728. ⟨hal-01592061⟩



Record views


Files downloads