Skip to Main content Skip to Navigation
Conference papers

Construction automatique d'une base de données étymologiques à partir du wiktionary

Abstract : Automatic construction of an etymological database using Wiktionary. Electronic lexical resources almost never contain etymological information. The availability of such information, if properly formalised, would open up the possibility of developing automatic tools targeted towards historical and comparative linguistics, as well as significantly improving the automatic processing of ancient languages. We describe here the process we implemented for extracting etymological data from the etymological notices found in Wiktionary. We have produced a multilingual database of nearly one million lexemes and a database of more than half a million etymological relations between lexemes.
Document type :
Conference papers
Complete list of metadata

Cited literature [4 references]  Display  Hide  Download

https://hal.inria.fr/hal-01584013
Contributor : Benoît Sagot <>
Submitted on : Friday, September 8, 2017 - 11:16:44 AM
Last modification on : Monday, December 14, 2020 - 5:27:10 PM

File

taln17etym.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01584013, version 1

Collections

Citation

Benoît Sagot. Construction automatique d'une base de données étymologiques à partir du wiktionary. Traitement Automatique des Langues Naturelles 2017, Jun 2017, Orléans, France. ⟨hal-01584013⟩

Share

Metrics

Record views

236

Files downloads

282