Arabic Statistical N-gram Models

Karima Meftouh 1 Kamel Smaïli 2 Mohamed Tayeb Laskri 1
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
Type de document :
Article dans une revue
International Review on Computers and Software (IRECOS), Praise Worthy Prize, 2009, 4 (1)
Liste complète des métadonnées

Littérature citée [5 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01639807
Contributeur : Kamel Smaïli <>
Soumis le : lundi 20 novembre 2017 - 16:22:24
Dernière modification le : dimanche 8 avril 2018 - 11:48:13

Identifiants

  • HAL Id : hal-01639807, version 1

Collections

Citation

Karima Meftouh, Kamel Smaïli, Mohamed Tayeb Laskri. Arabic Statistical N-gram Models. International Review on Computers and Software (IRECOS), Praise Worthy Prize, 2009, 4 (1). 〈hal-01639807〉

Partager

Métriques

Consultations de la notice

97

Téléchargements de fichiers

11