Skip to Main content Skip to Navigation
Conference papers

Comparison of Topic Identification methods for Arabic Language

Mourad Abbas 1 Kamel Smaïli 2
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper we present two well-known methods for topic identification. The first one is a TFIDF classifier approach, and the second one is a based machine learning approach which is called Support Vector Machines (SVM). In our knowledge, we do not know several works on Arabic topic identification. So that we decide to investigate in this article. The corpus we used is extracted from the daily Arabic newspaper it Akhbar Al Khaleej, it includes 5120 news articles corresponding to 2.855.069 words covering four topics : sport, local news, international news and economy. According to our experiments, the results are encouraging both for SVM and TFIDF classifier, however we have noticed the superiority of the SVM classifier and its high capability to distinguish topics.
Document type :
Conference papers
Complete list of metadata
Contributor : Kamel Smaïli Connect in order to contact the contributor
Submitted on : Tuesday, November 21, 2017 - 3:38:57 PM
Last modification on : Friday, February 26, 2021 - 3:28:06 PM


Files produced by the author(s)


  • HAL Id : inria-00000448, version 1



Mourad Abbas, Kamel Smaïli. Comparison of Topic Identification methods for Arabic Language. International Conference on Recent Advances in Natural Language Processing - RANLP 2005, Sep 2005, Borovets, Bulgaria. ⟨inria-00000448⟩



Les métriques sont temporairement indisponibles