Comparative study of Arabic and french statistical language models

Karima Meftouh; Kamel Smaïli; Med Tayeb Laskri

Conference Papers Year : 2009

Comparative study of Arabic and french statistical language models

(1) , (2) , (1)

1
2

Karima Meftouh

Function : Author
PersonId : 857254

Laboratoire de Recherche en Informatique

Kamel Smaïli

Function : Author
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Analysis, perception and recognition of speech

Med Tayeb Laskri

Function : Author
PersonId : 857255

Laboratoire de Recherche en Informatique

Abstract

In this paper, we propose a comparative study of statistical language models of Arabic and French. The objective of this study is to understand how to better model both Arabic and French. Several experiments using different smoothing techniques have been carried out. For French, trigram models are most appropriate whatever the smoothing technique used. For Arabic, the n-gram models of higher order smoothed with Witten Bell method are more efficient. Tests are achieved with comparable corpora and vocabularies in terms of size

Keywords

Statistical language modeling Arabic French smoothing technique n-gram model vocabulary perplexity performance

Domains

Computation and Language [cs.CL]

Fichier principal

ICAART.pdf (136.84 Ko)

Origin : Files produced by the author(s)

Kamel Smaïli : Connect in order to contact the contributor

https://inria.hal.science/inria-00352927

Submitted on : Tuesday, November 14, 2017-12:24:53 PM

Last modification on : Monday, February 12, 2024-12:04:05 PM

Long-term archiving on: Thursday, February 15, 2018-4:33:37 PM

Dates and versions

inria-00352927 , version 1 (14-11-2017)

Identifiers

HAL Id : inria-00352927 , version 1

Cite

Karima Meftouh, Kamel Smaïli, Med Tayeb Laskri. Comparative study of Arabic and french statistical language models. ICAART'09 - International Conference On agents and Artificial Intelligence, INSTICC, Jan 2009, Porto, Portugal. ⟨inria-00352927⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

181 View

236 Download

Comparative study of Arabic and french statistical language models

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share