Abstract : In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
https://hal.inria.fr/hal-01639807
Contributor : Kamel Smaïli <>
Submitted on : Monday, November 20, 2017 - 4:22:24 PM Last modification on : Sunday, April 8, 2018 - 11:48:13 AM