Comparison of Topic Identification methods for Arabic Language

Mourad Abbas 1 Kamel Smaïli 2
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper we present two well-known methods for topic identification. The first one is a TFIDF classifier approach, and the second one is a based machine learning approach which is called Support Vector Machines (SVM). In our knowledge, we do not know several works on Arabic topic identification. So that we decide to investigate in this article. The corpus we used is extracted from the daily Arabic newspaper it Akhbar Al Khaleej, it includes 5120 news articles corresponding to 2.855.069 words covering four topics : sport, local news, international news and economy. According to our experiments, the results are encouraging both for SVM and TFIDF classifier, however we have noticed the superiority of the SVM classifier and its high capability to distinguish topics.
Type de document :
Communication dans un congrès
International Conference on Recent Advances in Natural Language Processing - RANLP 2005, Sep 2005, Borovets, Bulgaria. 2005
Liste complète des métadonnées

https://hal.inria.fr/inria-00000448
Contributeur : Kamel Smaïli <>
Soumis le : mardi 21 novembre 2017 - 15:38:57
Dernière modification le : jeudi 11 janvier 2018 - 06:19:57

Fichier

ranlp2005.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00000448, version 1

Collections

Citation

Mourad Abbas, Kamel Smaïli. Comparison of Topic Identification methods for Arabic Language. International Conference on Recent Advances in Natural Language Processing - RANLP 2005, Sep 2005, Borovets, Bulgaria. 2005. 〈inria-00000448〉

Partager

Métriques

Consultations de la notice

625

Téléchargements de fichiers

46