Constructing a poor man’s wordnet in a resource-rich world

Abstract : In this paper we present a language-independent, fully modular and automatic approach to bootstrap a wordnet for a new language by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora, and Wikipedia. The approach, which we apply here to Slovene, takes into account monosemous and polysemous words, general and specialised vocabulary as well as simple and multi-word lexemes. The extracted words are then assigned one or several synset ids, based on a classifier that relies on several features including distributional similarity. Finally, we identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic, manual and task-based evaluations show that the resulting resource, the latest version of the Slovene wordnet, is already a valuable source of lexico-semantic information.
Type de document :
Article dans une revue
Language Resources and Evaluation, Springer Verlag, 2015, 49 (3), pp.601-635. 〈10.1007/s10579-015-9295-6〉
Liste complète des métadonnées

Littérature citée [63 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01174492
Contributeur : Benoît Sagot <>
Soumis le : mercredi 25 avril 2018 - 00:48:06
Dernière modification le : jeudi 15 novembre 2018 - 20:27:26
Document(s) archivé(s) le : mardi 18 septembre 2018 - 08:55:02

Fichier

lre15slownet_published.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Darja Fišer, Benoît Sagot. Constructing a poor man’s wordnet in a resource-rich world. Language Resources and Evaluation, Springer Verlag, 2015, 49 (3), pp.601-635. 〈10.1007/s10579-015-9295-6〉. 〈hal-01174492〉

Partager

Métriques

Consultations de la notice

331

Téléchargements de fichiers

88