Skip to Main content Skip to Navigation
Journal articles

TR-Classifier and kNN Evaluation for Topic Identification tasks

Abstract : This paper focuses on studying topic identification for Arabic language by using two methods. The first method is the well-known kNN (k Nearest Neighbors) which is used as baseline. The second one is the TR-Classifier, mainly based on computing triggers. The experiments show that TR-Classifier has the advantage to give best performances compared to kNN, by using much reduced sizes of Topic Vocabularies. TR-Classifier performance is enhanced by increasing jointly the number of triggers and the size of topic vocabularies. It should be noted that topic vocabularies are used by the TR-Classifier. Whereas, a general vocabulary is needed for kNN, and it is obtained by the concatenation of those used by the TR-Classifier. In addition to the standard measures Recall and Precision used for the evaluation step, we have drawn ROC curves for some topics to illustrate more clearly the difference in performance between the two classifiers. The corpus used in our experiments is downloaded from an online Arabic newspaper. Its size is about 10 millions words, distributed over six selected topics, in this case: culture, religion, economy, local news, international news and sports.
Document type :
Journal articles
Complete list of metadata

Cited literature [44 references]  Display  Hide  Download
Contributor : Kamel Smaïli Connect in order to contact the contributor
Submitted on : Monday, November 20, 2017 - 3:29:32 PM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM
Long-term archiving on: : Wednesday, February 21, 2018 - 3:59:55 PM


Files produced by the author(s)


  • HAL Id : hal-01586549, version 1



Mourad Abbas, Kamel Smaïli, Daoud Berkani. TR-Classifier and kNN Evaluation for Topic Identification tasks. International Journal on Information and Communication Technologies, Serials Publications, 2010, 3 (3), pp.10. ⟨hal-01586549⟩



Les métriques sont temporairement indisponibles