Combining multiple resources to build reliable wordnets

Abstract : This paper compares automatically generated sets of synonyms in French and Slovene wordnets with respect to the resources used in the construction process. Polysemous words were disambiguated via a five-language word-alignment of the SEERA.NET parallel corpus, a subcorpus of the JRC Acquis. The extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from different resources, including Wikipedia, Wiktionary and EUROVOC thesaurus. A representative sample of the generated synsets was evaluated against the gold-standards.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/inria-00614706
Contributor : Benoît Sagot <>
Submitted on : Monday, August 15, 2011 - 11:25:04 AM
Last modification on : Thursday, August 29, 2019 - 2:24:09 PM
Long-term archiving on : Friday, November 25, 2011 - 11:11:50 AM

File

TSD08.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00614706, version 1

Collections

Citation

Darja Fišer,, Benoît Sagot. Combining multiple resources to build reliable wordnets. TSD 2008 - Text Speech and Dialogue, 2008, Brno, Czech Republic. ⟨inria-00614706⟩

Share

Metrics

Record views

338

Files downloads

273