Arabic Statistical N-gram Models

Karima Meftouh; Kamel Smaïli; Mohamed Tayeb Laskri

Article Dans Une Revue International Review on Computers and Software (IRECOS) Année : 2009

Arabic Statistical N-gram Models

(1) , (2) , (1)

1
2

Karima Meftouh

Fonction : Auteur
PersonId : 857254

Université Badji Mokhtar [Annaba]

Kamel Smaïli

Fonction : Auteur
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Analysis, perception and recognition of speech

Mohamed Tayeb Laskri

Fonction : Auteur
PersonId : 857255

Université Badji Mokhtar [Annaba]

Résumé

In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity

Mots clés

Morpheme-based n-gram models word-based n-gram models Arabic N-gram models Statistical language model Perplexity

Domaines

Informatique et langage [cs.CL]

karima2_IRECOSprprint.pdf (484.94 Ko)

Kamel Smaïli : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01639807

Soumis le : lundi 20 novembre 2017-16:22:24

Dernière modification le : lundi 12 février 2024-12:04:05

Dates et versions

hal-01639807 , version 1 (20-11-2017)

Identifiants

HAL Id : hal-01639807 , version 1

Citer

Karima Meftouh, Kamel Smaïli, Mohamed Tayeb Laskri. Arabic Statistical N-gram Models. International Review on Computers and Software (IRECOS), 2009, 4 (1). ⟨hal-01639807⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

126 Consultations

33 Téléchargements

Arabic Statistical N-gram Models

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager