Extracting an Etymological Database from Wiktionary

Benoît Sagot

Communication Dans Un Congrès Année : 2017

Extracting an Etymological Database from Wiktionary

(1)

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Automatic Language Modelling and ANAlysis & Computational Humanities

Résumé

Electronic lexical resources almost never contain etymological information. The availability of such information, if properly formalised, could open up the possibility of developing automatic tools targeted towards historical and comparative linguistics, as well as significantly improving the automatic processing of ancient languages. We describe here the process we implemented for extracting etymological data from the etymological notices found in Wiktionary. We have produced a multilingual database of nearly one million lexemes and a database of more than half a million etymological relations between lexemes.

Mots clés

Lexical resource development etymology Wiktionary

Domaines

Informatique et langage [cs.CL] Linguistique

Fichier principal

paper44.pdf (709.97 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Benoît Sagot : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01592061

Soumis le : vendredi 22 septembre 2017-15:11:19

Dernière modification le : mardi 3 octobre 2023-17:18:04

Archivage à long terme le : samedi 23 décembre 2017-13:32:31

Dates et versions

hal-01592061 , version 1 (22-09-2017)

Identifiants

HAL Id : hal-01592061 , version 1

Citer

Benoît Sagot. Extracting an Etymological Database from Wiktionary. Electronic Lexicography in the 21st century (eLex 2017), Sep 2017, Leiden, Netherlands. pp.716-728. ⟨hal-01592061⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2 ANR

535 Consultations

1091 Téléchargements

Extracting an Etymological Database from Wiktionary

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager