Skip to Main content Skip to Navigation
New interface
Conference papers

Avtomatska razširitev in čiščenje sloWNeta

Abstract : In this paper we present a language-independent and automatic approach to extend a wordnet by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora and Wikipedia. The approach, applied to Slovene, takes into account monosemous and polysemous words, general and specialized vocabulary as well as simple and multi-word lexemes. The extracted words are assigned one or several synset ids based on a classifier that relies on several features including distributional similarity. In the next step we also identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic and manual evaluation show that the proposed approach yields very promising results.
Document type :
Conference papers
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download
Contributor : Benoît Sagot Connect in order to contact the contributor
Submitted on : Thursday, October 30, 2014 - 5:00:50 PM
Last modification on : Wednesday, November 2, 2022 - 11:00:46 AM
Long-term archiving on: : Monday, February 2, 2015 - 4:01:35 PM


Publisher files allowed on an open archive


  • HAL Id : hal-01078839, version 1


Darja Fišer, Benoît Sagot. Avtomatska razširitev in čiščenje sloWNeta. Devete konference Jezikovne Tehnologije / Ninth Language Technologies Conference, Oct 2014, Ljubljana, Slovenia. ⟨hal-01078839⟩



Record views


Files downloads