Building Resources for Algerian Arabic Dialects

Abstract : The Algerian Arabic dialects are under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We aim through this paper, and for the first time, to build parallel corpora for Algerian dialects, because our ultimate purpose is to achieve a Machine Translation (MT) for Modern Standard Arabic (MSA) and Algerian dialects (AD), in both directions. We also propose language tools to process these dialects. First, we developed a morphological analysis model of dialects by adapting BAMA, a well-known MSA analyzer. Then we propose a diacritization system, based on a MT process which allows to restore the vowels to dialects corpora. And finally, we propose results on machine translation between MSA and Algerian dialects.
Type de document :
Communication dans un congrès
15th Annual Conference of the International Communication Association Interspeech, Sep 2014, Singapour, Singapore. 2014
Liste complète des métadonnées

Littérature citée [22 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01066989
Contributeur : Kamel Smaïli <>
Soumis le : lundi 22 septembre 2014 - 17:14:46
Dernière modification le : mardi 24 avril 2018 - 13:33:27

Fichier

SmailiArabicDialectInterspeech...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01066989, version 1

Collections

Citation

Salima Harrat, Karima Meftouh, Mourad Abbas, Kamel Smaïli. Building Resources for Algerian Arabic Dialects. 15th Annual Conference of the International Communication Association Interspeech, Sep 2014, Singapour, Singapore. 2014. 〈hal-01066989〉

Partager

Métriques

Consultations de la notice

262

Téléchargements de fichiers

158