Arabic statistical language modeling

Karima Meftouh 1 Kamel Smaïli 2 Mohamed-Tayeb Laskri 1
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this study we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness of the data leads us to investigate other solutions without increasing the size of the corpus. A word segmentation technique has been employed in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity.
Document type :
Conference papers
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/inria-00402315
Contributor : Kamel Smaïli <>
Submitted on : Monday, November 20, 2017 - 10:15:03 AM
Last modification on : Sunday, April 8, 2018 - 11:48:13 AM
Long-term archiving on: Wednesday, February 21, 2018 - 12:18:49 PM

File

JADT2008meftouh-smaili-laskri....
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00402315, version 1

Collections

Citation

Karima Meftouh, Kamel Smaïli, Mohamed-Tayeb Laskri. Arabic statistical language modeling. 9es Journées internationales d'Analyse statistique des Données Textuelles - JADT 2008, Mar 2008, Lyon, France. pp.837-844. ⟨inria-00402315⟩

Share

Metrics

Record views

274

Files downloads

244