Skip to Main content Skip to Navigation
Conference papers

An empirical study of the Algerian dialect of Social network

Karima Abidi 1 Kamel Smaïli 1 
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this paper, we present analysis on the use of Algerian dialect in Youtube. To do so, we harvested a corpus of 17M of words. This latter was exploited to extract a comparable Algerian corpus, named CALYOU by aligning pairs of sentences written in Latin and Arabic. This one was built by using a multilingual word embeddings approach. Several experiments have been conducted to fix the parameters of the Continuous Bag of Words approach that will be discussed in this article. The method we proposed achieved a performance of 41% in terms of Recall. In the following, we present several figures on the collected data that led to several unexpected results. In fact, 51% of the vocabulary words are written in Latin script and 82% of the total comments are subject to the phenomenon of code-switching.
Document type :
Conference papers
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download
Contributor : Kamel Smaïli Connect in order to contact the contributor
Submitted on : Saturday, December 9, 2017 - 5:24:38 PM
Last modification on : Wednesday, November 3, 2021 - 7:57:21 AM


Files produced by the author(s)


  • HAL Id : hal-01659997, version 1



Karima Abidi, Kamel Smaïli. An empirical study of the Algerian dialect of Social network. ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing, Dec 2017, Casablanca, Morocco. ⟨hal-01659997⟩



Record views


Files downloads