Skip to Main content Skip to Navigation
New interface
Journal articles

Data-driven Synset Induction and Disambiguation for Wordnet Development

Abstract : Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and transferring its content in new languages using alignments, possibly combined with information extracted from multilingual semantic resources. Even if the role of PWN remains central in this process, these automatic methods offer an alternative to the manual elaboration of new wordnets. However, their limited coverage has a strong impact on that of the resulting resources. Following this line of research, we apply a cross-lingual word sense disambiguation method to wordnet development. Our approach exploits the output of a data-driven sense induction method that generates sense clusters in new languages, similar to wordnet synsets, by identifying word senses and relations in parallel corpora. We apply our cross-lingual word sense disambiguation method to the task of enriching a French wordnet resource, the WOLF, and show how it can be efficiently used for increasing its coverage. Although our experiments involve the English-French language pair, the proposed methodology is general enough to be applied to the development of wordnet resources in other languages for which parallel corpora are available. Finally, we show how the disambiguation output can serve to reduce the granularity of new wordnets and the degree of polysemy present in PWN.
Document type :
Journal articles
Complete list of metadata

Cited literature [40 references]  Display  Hide  Download
Contributor : Benoît Sagot Connect in order to contact the contributor
Submitted on : Thursday, November 27, 2014 - 11:01:08 AM
Last modification on : Thursday, November 3, 2022 - 3:30:26 AM
Long-term archiving on: : Monday, March 2, 2015 - 9:20:46 AM


Files produced by the author(s)



Marianna Apidianaki, Benoît Sagot. Data-driven Synset Induction and Disambiguation for Wordnet Development. Language Resources and Evaluation, 2014, 48 (4), pp.655-677. ⟨10.1007/s10579-014-9291-2⟩. ⟨hal-01088000⟩



Record views


Files downloads