Data-driven Synset Induction and Disambiguation for Wordnet Development

Abstract : Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and transferring its content in new languages using alignments, possibly combined with information extracted from multilingual semantic resources. Even if the role of PWN remains central in this process, these automatic methods offer an alternative to the manual elaboration of new wordnets. However, their limited coverage has a strong impact on that of the resulting resources. Following this line of research, we apply a cross-lingual word sense disambiguation method to wordnet development. Our approach exploits the output of a data-driven sense induction method that generates sense clusters in new languages, similar to wordnet synsets, by identifying word senses and relations in parallel corpora. We apply our cross-lingual word sense disambiguation method to the task of enriching a French wordnet resource, the WOLF, and show how it can be efficiently used for increasing its coverage. Although our experiments involve the English-French language pair, the proposed methodology is general enough to be applied to the development of wordnet resources in other languages for which parallel corpora are available. Finally, we show how the disambiguation output can serve to reduce the granularity of new wordnets and the degree of polysemy present in PWN.
Type de document :
Article dans une revue
Language Resources and Evaluation, Springer Verlag, 2014, 48 (4), pp.655-677. 〈10.1007/s10579-014-9291-2〉
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01088000
Contributeur : Benoît Sagot <>
Soumis le : jeudi 27 novembre 2014 - 11:01:08
Dernière modification le : jeudi 12 juillet 2018 - 10:58:01
Document(s) archivé(s) le : lundi 2 mars 2015 - 09:20:46

Fichier

LRE_Apidianaki_Sagot_camera_re...
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Marianna Apidianaki, Benoît Sagot. Data-driven Synset Induction and Disambiguation for Wordnet Development. Language Resources and Evaluation, Springer Verlag, 2014, 48 (4), pp.655-677. 〈10.1007/s10579-014-9291-2〉. 〈hal-01088000〉

Partager

Métriques

Consultations de la notice

302

Téléchargements de fichiers

213