Discovering Indicators for Classifying Wikipedia Articles in a Domain: A Case Study on Software Languages - Archive ouverte HAL Access content directly
Conference Papers Year :

Discovering Indicators for Classifying Wikipedia Articles in a Domain: A Case Study on Software Languages

(1) , (1) , (2)
1
2

Abstract

Wikipedia is a rich source of information across many knowledge domains. Yet, recovering articles relevant to a specific domain is a difficult problem since such articles may be rare and tend to cover multiple topics. Furthermore, Wikipedia's categories provide an ambiguous classification of articles as they relate to all topics and thus are of limited use. In this paper, we develop a new methodology to isolate Wikipedia's articles that describe a specific topic within the scope of relevant categories; the methodology uses supervised machine learning to retrieve a decision tree classifier based on articles' features (URL patterns, summary text, infoboxes, links from list articles). In a case study, we retrieve 3000+ articles that describe software (computer) languages. Available fragments of ground truths serve as an essential part of the training set to detect relevant articles. The results of the classification are thoroughly evaluated through a survey, in which 31 domain experts participated.
Fichier principal
Vignette du fichier
Discovering_Indicators_for_Classifying_Wikipedia_Articles_in_a_Domain_subtitle_A_Case_Study_on_Software_Languages (12).pdf (545.34 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02129131 , version 1 (14-05-2019)

Identifiers

  • HAL Id : hal-02129131 , version 1

Cite

Marcel Heinz, Ralf Lämmel, Mathieu Acher. Discovering Indicators for Classifying Wikipedia Articles in a Domain: A Case Study on Software Languages. SEKE 2019 - The 31st International Conference on Software Engineering and Knowledge Engineering, Jul 2019, Lisbonne, Portugal. pp.1-6. ⟨hal-02129131⟩
339 View
178 Download

Share

Gmail Facebook Twitter LinkedIn More