Arabic statistical language modeling

Karima Meftouh; Kamel Smaïli; Mohamed-Tayeb Laskri

Communication Dans Un Congrès Année : 2008

Arabic statistical language modeling

(1) , (2) , (1)

1
2

Karima Meftouh

Fonction : Auteur
PersonId : 857254

Laboratoire de Recherche en Informatique

Kamel Smaïli

Fonction : Auteur
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Analysis, perception and recognition of speech

Mohamed-Tayeb Laskri

Fonction : Auteur

Laboratoire de Recherche en Informatique

Résumé

In this study we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness of the data leads us to investigate other solutions without increasing the size of the corpus. A word segmentation technique has been employed in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity.

Mots clés

segmentation Arabic language text corpora statistical language model perplexity

Domaines

Informatique et langage [cs.CL]

Fichier principal

JADT2008meftouh-smaili-laskri.pdf (330.54 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Kamel Smaïli : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00402315

Soumis le : lundi 20 novembre 2017-10:15:03

Dernière modification le : lundi 12 février 2024-12:04:05

Archivage à long terme le : mercredi 21 février 2018-12:18:49

Dates et versions

inria-00402315 , version 1 (20-11-2017)

Identifiants

HAL Id : inria-00402315 , version 1

Citer

Karima Meftouh, Kamel Smaïli, Mohamed-Tayeb Laskri. Arabic statistical language modeling. 9es Journées internationales d'Analyse statistique des Données Textuelles - JADT 2008, Mar 2008, Lyon, France. pp.837-844. ⟨inria-00402315⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

191 Consultations

442 Téléchargements

Arabic statistical language modeling

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager