Data-driven Synset Induction and Disambiguation for Wordnet Development

Abstract : Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and transferring its content in new languages using alignments, possibly combined with information extracted from multilingual semantic resources. Even if the role of PWN remains central in this process, these automatic methods offer an alternative to the manual elaboration of new wordnets. However, their limited coverage has a strong impact on that of the resulting resources. Following this line of research, we apply a cross-lingual word sense disambiguation method to wordnet development. Our approach exploits the output of a data-driven sense induction method that generates sense clusters in new languages, similar to wordnet synsets, by identifying word senses and relations in parallel corpora. We apply our cross-lingual word sense disambiguation method to the task of enriching a French wordnet resource, the WOLF, and show how it can be efficiently used for increasing its coverage. Although our experiments involve the English-French language pair, the proposed methodology is general enough to be applied to the development of wordnet resources in other languages for which parallel corpora are available. Finally, we show how the disambiguation output can serve to reduce the granularity of new wordnets and the degree of polysemy present in PWN.
Document type :
Journal articles
Complete list of metadatas

Cited literature [40 references]  Display  Hide  Download

https://hal.inria.fr/hal-01088000
Contributor : Benoît Sagot <>
Submitted on : Thursday, November 27, 2014 - 11:01:08 AM
Last modification on : Saturday, May 4, 2019 - 1:19:18 AM
Long-term archiving on : Monday, March 2, 2015 - 9:20:46 AM

File

LRE_Apidianaki_Sagot_camera_re...
Files produced by the author(s)

Identifiers

Citation

Marianna Apidianaki, Benoît Sagot. Data-driven Synset Induction and Disambiguation for Wordnet Development. Language Resources and Evaluation, Springer Verlag, 2014, 48 (4), pp.655-677. ⟨10.1007/s10579-014-9291-2⟩. ⟨hal-01088000⟩

Share

Metrics

Record views

375

Files downloads

312