Skip to Main content Skip to Navigation
Conference papers

An empirical study of the Algerian dialect of Social network

Karima Abidi 1 Kamel Smaïli 1
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this paper, we present analysis on the use of Algerian dialect in Youtube. To do so, we harvested a corpus of 17M of words. This latter was exploited to extract a comparable Algerian corpus, named CALYOU by aligning pairs of sentences written in Latin and Arabic. This one was built by using a multilingual word embeddings approach. Several experiments have been conducted to fix the parameters of the Continuous Bag of Words approach that will be discussed in this article. The method we proposed achieved a performance of 41% in terms of Recall. In the following, we present several figures on the collected data that led to several unexpected results. In fact, 51% of the vocabulary words are written in Latin script and 82% of the total comments are subject to the phenomenon of code-switching.
Document type :
Conference papers
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download

https://hal.inria.fr/hal-01659997
Contributor : Kamel Smaïli <>
Submitted on : Saturday, December 9, 2017 - 5:24:38 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM

File

ICNLSSP2017_paper_16.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01659997, version 1

Collections

Citation

Karima Abidi, Kamel Smaïli. An empirical study of the Algerian dialect of Social network. ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing, Dec 2017, Casablanca, Morocco. ⟨hal-01659997⟩

Share

Metrics

Record views

270

Files downloads

285