Arabic Statistical N-gram Models - Archive ouverte HAL Access content directly
Journal Articles International Review on Computers and Software (IRECOS) Year : 2009

Arabic Statistical N-gram Models

(1) , (2) , (1)
1
2
Karima Meftouh
  • Function : Author
  • PersonId : 857254
Kamel Smaïli
Mohamed Tayeb Laskri
  • Function : Author
  • PersonId : 857255

Abstract

In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
Vignette du fichier
karima2_IRECOSprprint.pdf (484.94 Ko) Télécharger le fichier

Dates and versions

hal-01639807 , version 1 (20-11-2017)

Identifiers

  • HAL Id : hal-01639807 , version 1

Cite

Karima Meftouh, Kamel Smaïli, Mohamed Tayeb Laskri. Arabic Statistical N-gram Models. International Review on Computers and Software (IRECOS), 2009, 4 (1). ⟨hal-01639807⟩
129 View
26 Download

Share

Gmail Facebook Twitter LinkedIn More