Comparison of Topic Identification methods for Arabic Language

Mourad Abbas 1 Kamel Smaïli 2
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper we present two well-known methods for topic identification. The first one is a TFIDF classifier approach, and the second one is a based machine learning approach which is called Support Vector Machines (SVM). In our knowledge, we do not know several works on Arabic topic identification. So that we decide to investigate in this article. The corpus we used is extracted from the daily Arabic newspaper it Akhbar Al Khaleej, it includes 5120 news articles corresponding to 2.855.069 words covering four topics : sport, local news, international news and economy. According to our experiments, the results are encouraging both for SVM and TFIDF classifier, however we have noticed the superiority of the SVM classifier and its high capability to distinguish topics.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/inria-00000448
Contributor : Kamel Smaïli <>
Submitted on : Tuesday, November 21, 2017 - 3:38:57 PM
Last modification on : Thursday, January 11, 2018 - 6:19:57 AM

File

ranlp2005.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00000448, version 1

Collections

Citation

Mourad Abbas, Kamel Smaïli. Comparison of Topic Identification methods for Arabic Language. International Conference on Recent Advances in Natural Language Processing - RANLP 2005, Sep 2005, Borovets, Bulgaria. ⟨inria-00000448⟩

Share

Metrics

Record views

667

Files downloads

57