SEWordSim: software-specific word similarity database

Abstract : Measuring the similarity of words is important in accurately representing and comparing documents, and thus improves the results of many natural language processing (NLP) tasks. The NLP community has proposed various measurements based on WordNet, a lexical database that contains relationships between many pairs of words. Recently, a number of techniques have been proposed to address software engineering issues such as code search and fault localization that require understanding natural language documents, and a measure of word similarity could improve their results. However, WordNet only contains information about words senses in general-purpose conversation, which often differ from word senses in a software-engineering context, and the software-specific word similarity resources that have been developed rely on data sources containing only a limited range of words and word uses.In recent work, we have proposed a word similarity resource based on information collected automatically from StackOverflow. We have found that the results of this resource are given scores on a 3-point Likert scale that are over 50% higher than the results of a resource based on WordNet. In this demo paper, we review our data collection methodology and propose a Java API to make the resulting word similarity resource useful in practice.The SEWordSim database and related information can be found at http://goo.gl/BVEAs8. Demo video is available at http://goo.gl/dyNwyb.
Type de document :
Poster
ICSE Companion 2014 - Companion Proceedings of the 36th International Conference on Software Engineering, May 2014, Hyderabad, India. ACM, pp.568-571, 〈http://2014.icse-conferences.org/〉. 〈10.1145/2591062.2591071〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01086079
Contributeur : Julia Lawall <>
Soumis le : vendredi 21 novembre 2014 - 17:57:01
Dernière modification le : jeudi 22 novembre 2018 - 14:21:21

Lien texte intégral

Identifiants

Citation

Yuan Tian, David Lo, Julia Lawall. SEWordSim: software-specific word similarity database. ICSE Companion 2014 - Companion Proceedings of the 36th International Conference on Software Engineering, May 2014, Hyderabad, India. ACM, pp.568-571, 〈http://2014.icse-conferences.org/〉. 〈10.1145/2591062.2591071〉. 〈hal-01086079〉

Partager

Métriques

Consultations de la notice

180