Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots

Said Gadri; Abdelouahab Moussaoui

doi:10.1007/978-3-319-19578-0_14

Communication Dans Un Congrès Année : 2015

Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots

(1) , (2)

1
2

Said Gadri

Fonction : Auteur
PersonId : 1031978

University of M'sila / Université Mohamed Boudiaf - M'sila

Abdelouahab Moussaoui

Fonction : Auteur
PersonId : 992053

Université Ferhat-Abbas Sétif 1 [Sétif]

Résumé

One of methods used to reduce the size of terms vocabulary in Arabic text categorization is to replace the different variants (forms) of words by their common root. The search of root in Arabic or Arabic word root extraction is more difficult than other languages since Arabic language has a very different and difficult structure, that is because it is a very rich language with complex morphology. Many algorithms are proposed in this field. Some of them are based on morphological rules and grammatical patterns, thus they are quite difficult and require deep linguistic knowledge. Others are statistical, so they are less difficult and based only on some calculations. In this paper we propose a new statistical algorithm which permits to extract roots of Arabic words using the technique of n-grams of characters without using any morphological rule or grammatical patterns.

Domaines

Informatique [cs]

Fichier principal

339159_1_En_14_Chapter.pdf (629.74 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01789980

Soumis le : vendredi 11 mai 2018-15:12:04

Dernière modification le : vendredi 26 janvier 2024-16:42:06

Archivage à long terme le : mardi 25 septembre 2018-04:48:25

Dates et versions

hal-01789980 , version 1 (11-05-2018)

Licence

Paternité

Identifiants

HAL Id : hal-01789980 , version 1
DOI : 10.1007/978-3-319-19578-0_14

Citer

Said Gadri, Abdelouahab Moussaoui. Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots. 5th International Conference on Computer Science and Its Applications (CIIA), May 2015, Saida, Algeria. pp.167-180, ⟨10.1007/978-3-319-19578-0_14⟩. ⟨hal-01789980⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-AICT IFIP-TC IFIP-TC5 IFIP-AICT-456 IFIP-CIIA

292 Consultations

129 Téléchargements

Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager