Skip to Main content Skip to Navigation
Journal articles

Arabic Statistical N-gram Models

Karima Meftouh 1 Kamel Smaïli 2 Mohamed Tayeb Laskri 1
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
Document type :
Journal articles
Complete list of metadatas

Cited literature [5 references]  Display  Hide  Download
Contributor : Kamel Smaïli <>
Submitted on : Monday, November 20, 2017 - 4:22:24 PM
Last modification on : Sunday, April 8, 2018 - 11:48:13 AM


  • HAL Id : hal-01639807, version 1



Karima Meftouh, Kamel Smaïli, Mohamed Tayeb Laskri. Arabic Statistical N-gram Models. International Review on Computers and Software (IRECOS), Praise Worthy Prize, 2009, 4 (1). ⟨hal-01639807⟩



Record views


Files downloads