An end-to-end learning solution for assessing the quality of Wikipedia articles

Quang-Vinh Dang 1 Claudia-Lavinia Ignat 1
1 COAST - Web Scale Trustworthy Collaborative Service Systems
Inria Nancy - Grand Est, LORIA - NSS - Department of Networks, Systems and Services
Abstract : Wikipedia is considered as the largest knowledge repository in the history of humanity and plays a crucial role in modern daily life. Assigning the correct quality class to Wikipedia articles is an important task in order to provide guidance for both authors and readers of Wikipedia. Manual review cannot cope with the editing speed of Wikipedia. An automatic classification is required to classify quality of Wikipedia articles. Most existing approaches rely on traditional machine learning with manual feature engineering, which requires a lot of expertise and effort. Furthermore, it is known that there is no general perfect feature set, because information leak always occurs in feature extraction phase. Also, for each language of Wikipedia a new feature set is required. In this paper, we present an approach relying on deep learning for quality classification of Wikipedia articles. Our solution relies on Recurrent Neural Networks (RNN) which is an end-to-end learning technique that eliminates disadvantages of feature engineering. Our approach learns directly from raw data without human intervention and is language-neutral. Experimental results on English, French and Russian Wikipedia datasets show that our approach outperforms state-of-the-art solutions.
Document type :
Conference papers
Complete list of metadatas

Cited literature [61 references]  Display  Hide  Download
Contributor : Quang Vinh Dang <>
Submitted on : Friday, July 28, 2017 - 3:02:04 PM
Last modification on : Tuesday, December 18, 2018 - 4:26:02 PM


OpenSym2017 (1).pdf
Publisher files allowed on an open archive



Quang-Vinh Dang, Claudia-Lavinia Ignat. An end-to-end learning solution for assessing the quality of Wikipedia articles. OpenSym 2017 - International Symposium on Open Collaboration, Aug 2017, Galway, Ireland. ⟨10.1145/3125433.3125448⟩. ⟨hal-01559693v3⟩



Record views


Files downloads