Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots

Abstract : One of methods used to reduce the size of terms vocabulary in Arabic text categorization is to replace the different variants (forms) of words by their common root. The search of root in Arabic or Arabic word root extraction is more difficult than other languages since Arabic language has a very different and difficult structure, that is because it is a very rich language with complex morphology. Many algorithms are proposed in this field. Some of them are based on morphological rules and grammatical patterns, thus they are quite difficult and require deep linguistic knowledge. Others are statistical, so they are less difficult and based only on some calculations. In this paper we propose a new statistical algorithm which permits to extract roots of Arabic words using the technique of n-grams of characters without using any morphological rule or grammatical patterns.
Document type :
Conference papers
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal.inria.fr/hal-01789980
Contributor : Hal Ifip <>
Submitted on : Friday, May 11, 2018 - 3:12:04 PM
Last modification on : Friday, October 5, 2018 - 10:00:02 PM
Long-term archiving on : Tuesday, September 25, 2018 - 4:48:25 AM

File

339159_1_En_14_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Said Gadri, Abdelouahab Moussaoui. Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots. 5th International Conference on Computer Science and Its Applications (CIIA), May 2015, Saida, Algeria. pp.167-180, ⟨10.1007/978-3-319-19578-0_14⟩. ⟨hal-01789980⟩

Share

Metrics

Record views

322

Files downloads

114