Building Resources for Algerian Arabic Dialects - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Building Resources for Algerian Arabic Dialects

Résumé

The Algerian Arabic dialects are under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We aim through this paper, and for the first time, to build parallel corpora for Algerian dialects, because our ultimate purpose is to achieve a Machine Translation (MT) for Modern Standard Arabic (MSA) and Algerian dialects (AD), in both directions. We also propose language tools to process these dialects. First, we developed a morphological analysis model of dialects by adapting BAMA, a well-known MSA analyzer. Then we propose a diacritization system, based on a MT process which allows to restore the vowels to dialects corpora. And finally, we propose results on machine translation between MSA and Algerian dialects.
Fichier principal
Vignette du fichier
SmailiArabicDialectInterspeech2014.pdf (304.14 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01066989 , version 1 (22-09-2014)

Identifiants

  • HAL Id : hal-01066989 , version 1

Citer

Salima Harrat, Karima Meftouh, Mourad Abbas, Kamel Smaïli. Building Resources for Algerian Arabic Dialects. 15th Annual Conference of the International Communication Association Interspeech, ISCA, Sep 2014, Singapour, Singapore. ⟨hal-01066989⟩
566 Consultations
677 Téléchargements

Partager

Gmail Facebook X LinkedIn More