Comparison of Topic Identification methods for Arabic Language

Mourad Abbas; Kamel Smaïli

Communication Dans Un Congrès Année : 2005

Comparison of Topic Identification methods for Arabic Language

(1) , (2)

1
2

Mourad Abbas

Fonction : Auteur
PersonId : 830710

École nationale polytechnique [Alger, Algérie]

Kamel Smaïli

Fonction : Auteur
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Analysis, perception and recognition of speech

Résumé

In this paper we present two well-known methods for topic identification. The first one is a TFIDF classifier approach, and the second one is a based machine learning approach which is called Support Vector Machines (SVM). In our knowledge, we do not know several works on Arabic topic identification. So that we decide to investigate in this article. The corpus we used is extracted from the daily Arabic newspaper it Akhbar Al Khaleej, it includes 5120 news articles corresponding to 2.855.069 words covering four topics : sport, local news, international news and economy. According to our experiments, the results are encouraging both for SVM and TFIDF classifier, however we have noticed the superiority of the SVM classifier and its high capability to distinguish topics.

Mots clés

SVM TFIDF Topic Identification

Domaines

Informatique et langage [cs.CL]

Fichier principal

ranlp2005.pdf (5.58 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Kamel Smaïli : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00000448

Soumis le : mardi 21 novembre 2017-15:38:57

Dernière modification le : jeudi 7 mars 2024-10:34:03

Dates et versions

inria-00000448 , version 1 (21-11-2017)

Identifiants

HAL Id : inria-00000448 , version 1

Citer

Mourad Abbas, Kamel Smaïli. Comparison of Topic Identification methods for Arabic Language. International Conference on Recent Advances in Natural Language Processing - RANLP 2005, Sep 2005, Borovets, Bulgaria. ⟨inria-00000448⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

459 Consultations

47 Téléchargements

Comparison of Topic Identification methods for Arabic Language

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager