Avtomatska razširitev in čiščenje sloWNeta

Abstract : In this paper we present a language-independent and automatic approach to extend a wordnet by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora and Wikipedia. The approach, applied to Slovene, takes into account monosemous and polysemous words, general and specialized vocabulary as well as simple and multi-word lexemes. The extracted words are assigned one or several synset ids based on a classifier that relies on several features including distributional similarity. In the next step we also identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic and manual evaluation show that the proposed approach yields very promising results.
Document type :
Conference papers
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/hal-01078839
Contributor : Benoît Sagot <>
Submitted on : Thursday, October 30, 2014 - 5:00:50 PM
Last modification on : Thursday, August 29, 2019 - 2:24:07 PM
Long-term archiving on : Monday, February 2, 2015 - 4:01:35 PM

File

isjt2014_07.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01078839, version 1

Collections

Citation

Darja Fišer, Benoît Sagot. Avtomatska razširitev in čiščenje sloWNeta. Devete konference Jezikovne Tehnologije / Ninth Language Technologies Conference, Oct 2014, Ljubljana, Slovenia. ⟨hal-01078839⟩

Share

Metrics

Record views

246

Files downloads

347