Skip to Main content Skip to Navigation
New interface
Conference papers

Cleaning noisy wordnets

Abstract : Automatic approaches to creating and extending wordnets, which have become very popular in the past decade, inadvertently result in noisy synsets. This is why we propose an approach to detect synset outliers in order to eliminate the noise and improve accuracy of the developed wordnets, so that they become more useful lexico-semantic resources for natural language applications. The approach compares the words that appear in the synset and its surroundings with the contexts of the literals in question they are used in based on large monolingual corpora. By fine-tuning the outlier threshold we can influence how many outlier candidates will be eliminated. Although the proposed approach is language-independent we test it on Slovene and French that were created automatically from bilingual resources and contain plenty of disambiguation errors. Manual evaluation of the results shows that by applying a threshold similar to the estimated error rate in the respective wordnets, 67% of the proposed outlier candidates are indeed incorrect for French and a 64% for Slovene. This is a big improvement compared to the estimated overall error rates in the resources, which are 12% for French and 15% for Slovene.
Document type :
Conference papers
Complete list of metadata
Contributor : Benoît Sagot Connect in order to contact the contributor
Submitted on : Thursday, May 31, 2012 - 9:35:53 PM
Last modification on : Wednesday, November 2, 2022 - 10:42:32 AM
Long-term archiving on: : Thursday, December 15, 2016 - 9:20:17 AM


Files produced by the author(s)


  • HAL Id : hal-00703125, version 1



Benoît Sagot, Darja Fišer. Cleaning noisy wordnets. LREC 2012 - Eighth International Conference on Language Resources and Evaluation, May 2012, Istanbul, Turkey. ⟨hal-00703125⟩



Record views


Files downloads