Combining multiple resources to build reliable wordnets

Abstract : This paper compares automatically generated sets of synonyms in French and Slovene wordnets with respect to the resources used in the construction process. Polysemous words were disambiguated via a five-language word-alignment of the SEERA.NET parallel corpus, a subcorpus of the JRC Acquis. The extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from different resources, including Wikipedia, Wiktionary and EUROVOC thesaurus. A representative sample of the generated synsets was evaluated against the gold-standards.
Type de document :
Communication dans un congrès
TSD 2008 - Text Speech and Dialogue, 2008, Brno, Czech Republic. 2008
Liste complète des métadonnées

https://hal.inria.fr/inria-00614706
Contributeur : Benoît Sagot <>
Soumis le : lundi 15 août 2011 - 11:25:04
Dernière modification le : samedi 9 juin 2018 - 10:30:06
Document(s) archivé(s) le : vendredi 25 novembre 2011 - 11:11:50

Fichier

TSD08.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00614706, version 1

Collections

Citation

Darja Fišer,, Benoît Sagot. Combining multiple resources to build reliable wordnets. TSD 2008 - Text Speech and Dialogue, 2008, Brno, Czech Republic. 2008. 〈inria-00614706〉

Partager

Métriques

Consultations de la notice

300

Téléchargements de fichiers

133