Skip to Main content Skip to Navigation
Journal articles

Arabic Statistical N-gram Models

Karima Meftouh 1 Kamel Smaïli 2 Mohamed Tayeb Laskri 1
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
Document type :
Journal articles
Complete list of metadata

Cited literature [5 references]  Display  Hide  Download
Contributor : Kamel Smaïli Connect in order to contact the contributor
Submitted on : Monday, November 20, 2017 - 4:22:24 PM
Last modification on : Friday, February 26, 2021 - 3:28:06 PM


  • HAL Id : hal-01639807, version 1



Karima Meftouh, Kamel Smaïli, Mohamed Tayeb Laskri. Arabic Statistical N-gram Models. International Review on Computers and Software (IRECOS), Praise Worthy Prize, 2009, 4 (1). ⟨hal-01639807⟩



Les métriques sont temporairement indisponibles