Arabic Statistical N-gram Models

Karima Meftouh 1 Kamel Smaïli 2 Mohamed Tayeb Laskri 1
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
Document type :
Journal articles
Complete list of metadatas

Cited literature [5 references]  Display  Hide  Download

https://hal.inria.fr/hal-01639807
Contributor : Kamel Smaïli <>
Submitted on : Monday, November 20, 2017 - 4:22:24 PM
Last modification on : Sunday, April 8, 2018 - 11:48:13 AM

Identifiers

  • HAL Id : hal-01639807, version 1

Collections

Citation

Karima Meftouh, Kamel Smaïli, Mohamed Tayeb Laskri. Arabic Statistical N-gram Models. International Review on Computers and Software (IRECOS), Praise Worthy Prize, 2009, 4 (1). ⟨hal-01639807⟩

Share

Metrics

Record views

149

Files downloads

16