Skip to Main content Skip to Navigation
Theses

Statistical Machine Translation: Application to low resourced languages

Abstract : This work is dedicated to statistical machine translation for poorly resourced languages. We are interested in Arabic dialects which represent the daily language of all Arab peoples. These dialects differ from one Arab country to another and even in the same country several variations of dialects coexist. These dialects by their oral nature and non-standard represent a challenge in NLP. In machine translation, these dialects are difficult to translate because of the lack of resources (of all natures) in particular the monolingual and especially parallel corpora necessary for training. In this thesis, we are interested by this issue with particular attention to the Algerian dialect and more precisely to the Algiers dialect. A parallel multi-dialect PADIC corpus (for Parallel Arabic Dialect Corpus) has been created, this is a textual resource important which includes, so far, six Arabic dialects in addition to Modern Standard Arabic. This corpus was the subject of an analytical study to highlight the relationship between dialects (between them) and Standard Arabic. By means of the corpus PADIC, we tackled the problem of statistical machine translation between the different dialect pairs and Standard Arabic. Several results have been obtained and all point to the difficulty of translating dialects. In addition, several tools dedicated to the Algiers dialect have been produced in the framework of this thesis. The problem of code-switching was also discussed where an identification tool was implemented using techniques of "Machine Learning".
Document type :
Theses
Complete list of metadata

https://hal.inria.fr/tel-03186940
Contributor : Kamel Smaïli <>
Submitted on : Wednesday, March 31, 2021 - 3:20:43 PM
Last modification on : Friday, April 2, 2021 - 3:27:16 AM

File

these-Salima-Harrat.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-03186940, version 1

Citation

Salima Harrat. Statistical Machine Translation: Application to low resourced languages. Computation and Language [cs.CL]. École Supérieure d’Informatique, 2018. English. ⟨tel-03186940⟩

Share

Metrics

Record views

47

Files downloads

152