Constructing a poor man’s wordnet in a resource-rich world

Darja Fišer; Benoît Sagot

doi:10.1007/s10579-015-9295-6

Article Dans Une Revue Language Resources and Evaluation Année : 2015

Constructing a poor man’s wordnet in a resource-rich world

(1) , (2)

1
2

Darja Fišer

Fonction : Auteur
PersonId : 907730

Department of Translation Studies

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing

Résumé

In this paper we present a language-independent, fully modular and automatic approach to bootstrap a wordnet for a new language by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora, and Wikipedia. The approach, which we apply here to Slovene, takes into account monosemous and polysemous words, general and specialised vocabulary as well as simple and multi-word lexemes. The extracted words are then assigned one or several synset ids, based on a classifier that relies on several features including distributional similarity. Finally, we identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic, manual and task-based evaluations show that the resulting resource, the latest version of the Slovene wordnet, is already a valuable source of lexico-semantic information.

Mots clés

Wordnet development Multilingual lexicon extraction Word-sense disambiguation Distributional similarity

Domaines

Informatique et langage [cs.CL]

Fichier principal

lre15slownet_published.pdf (837.49 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Benoît Sagot : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01174492

Soumis le : mercredi 25 avril 2018-00:48:06

Dernière modification le : mercredi 2 novembre 2022-10:32:27

Archivage à long terme le : mardi 18 septembre 2018-08:55:02

Dates et versions

hal-01174492 , version 1 (25-04-2018)

Identifiants

HAL Id : hal-01174492 , version 1
DOI : 10.1007/s10579-015-9295-6

Citer

Darja Fišer, Benoît Sagot. Constructing a poor man’s wordnet in a resource-rich world. Language Resources and Evaluation, 2015, 49 (3), pp.601-635. ⟨10.1007/s10579-015-9295-6⟩. ⟨hal-01174492⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS7 INRIA INRIA2 CAMPUS-AAR AAI ANR

295 Consultations

569 Téléchargements

Constructing a poor man’s wordnet in a resource-rich world

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager