An empirical study of the Algerian dialect of Social network

Karima Abidi 1 Kamel Smaïli 1
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this paper, we present analysis on the use of Algerian dialect in Youtube. To do so, we harvested a corpus of 17M of words. This latter was exploited to extract a comparable Algerian corpus, named CALYOU by aligning pairs of sentences written in Latin and Arabic. This one was built by using a multilingual word embeddings approach. Several experiments have been conducted to fix the parameters of the Continuous Bag of Words approach that will be discussed in this article. The method we proposed achieved a performance of 41% in terms of Recall. In the following, we present several figures on the collected data that led to several unexpected results. In fact, 51% of the vocabulary words are written in Latin script and 82% of the total comments are subject to the phenomenon of code-switching.
Type de document :
Communication dans un congrès
ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing, Dec 2017, Casablanca, Morocco. 〈http://icnlssp.isga.ma〉
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01659997
Contributeur : Kamel Smaïli <>
Soumis le : samedi 9 décembre 2017 - 17:24:38
Dernière modification le : mardi 24 avril 2018 - 13:29:50

Fichier

ICNLSSP2017_paper_16.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01659997, version 1

Citation

Karima Abidi, Kamel Smaïli. An empirical study of the Algerian dialect of Social network. ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing, Dec 2017, Casablanca, Morocco. 〈http://icnlssp.isga.ma〉. 〈hal-01659997〉

Partager

Métriques

Consultations de la notice

139

Téléchargements de fichiers

96