Avtomatska razširitev in čiščenje sloWNeta

Abstract : In this paper we present a language-independent and automatic approach to extend a wordnet by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora and Wikipedia. The approach, applied to Slovene, takes into account monosemous and polysemous words, general and specialized vocabulary as well as simple and multi-word lexemes. The extracted words are assigned one or several synset ids based on a classifier that relies on several features including distributional similarity. In the next step we also identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic and manual evaluation show that the proposed approach yields very promising results.
Type de document :
Communication dans un congrès
Devete konference Jezikovne Tehnologije / Ninth Language Technologies Conference, Oct 2014, Ljubljana, Slovenia. 〈http://nl.ijs.si/isjt14/index-en.html〉
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01078839
Contributeur : Benoît Sagot <>
Soumis le : jeudi 30 octobre 2014 - 17:00:50
Dernière modification le : samedi 9 juin 2018 - 10:30:05
Document(s) archivé(s) le : lundi 2 février 2015 - 16:01:35

Fichier

isjt2014_07.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01078839, version 1

Collections

Citation

Darja Fišer, Benoît Sagot. Avtomatska razširitev in čiščenje sloWNeta. Devete konference Jezikovne Tehnologije / Ninth Language Technologies Conference, Oct 2014, Ljubljana, Slovenia. 〈http://nl.ijs.si/isjt14/index-en.html〉. 〈hal-01078839〉

Partager

Métriques

Consultations de la notice

212

Téléchargements de fichiers

74