Skip to Main content Skip to Navigation
Conference papers

Building Resources for Algerian Arabic Dialects

Abstract : The Algerian Arabic dialects are under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We aim through this paper, and for the first time, to build parallel corpora for Algerian dialects, because our ultimate purpose is to achieve a Machine Translation (MT) for Modern Standard Arabic (MSA) and Algerian dialects (AD), in both directions. We also propose language tools to process these dialects. First, we developed a morphological analysis model of dialects by adapting BAMA, a well-known MSA analyzer. Then we propose a diacritization system, based on a MT process which allows to restore the vowels to dialects corpora. And finally, we propose results on machine translation between MSA and Algerian dialects.
Document type :
Conference papers
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download

https://hal.inria.fr/hal-01066989
Contributor : Kamel Smaïli <>
Submitted on : Monday, September 22, 2014 - 5:14:46 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM

File

SmailiArabicDialectInterspeech...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01066989, version 1

Collections

Citation

Salima Harrat, Karima Meftouh, Mourad Abbas, Kamel Smaïli. Building Resources for Algerian Arabic Dialects. 15th Annual Conference of the International Communication Association Interspeech, ISCA, Sep 2014, Singapour, Singapore. ⟨hal-01066989⟩

Share

Metrics

Record views

498

Files downloads

580