Comparison of Topic Identification methods for Arabic Language - Archive ouverte HAL Access content directly
Conference Papers Year : 2005

Comparison of Topic Identification methods for Arabic Language

(1) , (2)
1
2
Mourad Abbas
  • Function : Author
  • PersonId : 830710
Kamel Smaïli

Abstract

In this paper we present two well-known methods for topic identification. The first one is a TFIDF classifier approach, and the second one is a based machine learning approach which is called Support Vector Machines (SVM). In our knowledge, we do not know several works on Arabic topic identification. So that we decide to investigate in this article. The corpus we used is extracted from the daily Arabic newspaper it Akhbar Al Khaleej, it includes 5120 news articles corresponding to 2.855.069 words covering four topics : sport, local news, international news and economy. According to our experiments, the results are encouraging both for SVM and TFIDF classifier, however we have noticed the superiority of the SVM classifier and its high capability to distinguish topics.
Fichier principal
Vignette du fichier
ranlp2005.pdf (5.58 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

inria-00000448 , version 1 (21-11-2017)

Identifiers

  • HAL Id : inria-00000448 , version 1

Cite

Mourad Abbas, Kamel Smaïli. Comparison of Topic Identification methods for Arabic Language. International Conference on Recent Advances in Natural Language Processing - RANLP 2005, Sep 2005, Borovets, Bulgaria. ⟨inria-00000448⟩
438 View
29 Download

Share

Gmail Facebook Twitter LinkedIn More