Skip to Main content Skip to Navigation
New interface
Conference papers

Arabic statistical language modeling

Karima Meftouh 1 Kamel Smaïli 2 Mohamed-Tayeb Laskri 1 
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this study we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness of the data leads us to investigate other solutions without increasing the size of the corpus. A word segmentation technique has been employed in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity.
Document type :
Conference papers
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Kamel Smaïli Connect in order to contact the contributor
Submitted on : Monday, November 20, 2017 - 10:15:03 AM
Last modification on : Tuesday, October 25, 2022 - 4:22:24 PM
Long-term archiving on: : Wednesday, February 21, 2018 - 12:18:49 PM


Files produced by the author(s)


  • HAL Id : inria-00402315, version 1



Karima Meftouh, Kamel Smaïli, Mohamed-Tayeb Laskri. Arabic statistical language modeling. 9es Journées internationales d'Analyse statistique des Données Textuelles - JADT 2008, Mar 2008, Lyon, France. pp.837-844. ⟨inria-00402315⟩



Record views


Files downloads