Skip to Main content Skip to Navigation
Conference papers

Parallel Corpora Preparation for English-Amharic Machine Translation

yohanens Biadgline 1 Kamel Smaïli 2 
2 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) experiments. The performance using the bilingual evaluation understudy metric (BLEU) shows 26.47 and 32.44 respectively for SMT and NMT. The corpus was collected from the Internet using automatic and semi automatic techniques. The harvested corpus concerns domains coming from Religion, Law, and News. Finally, the corpus, we built is composed of 225,304 parallel sentences, it will be shared for free with the community. In our knowledge, this is the biggest parallel corpus so far concerning the Amharic language.
Document type :
Conference papers
Complete list of metadata
Contributor : Kamel Smaïli Connect in order to contact the contributor
Submitted on : Monday, June 28, 2021 - 11:22:07 AM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM
Long-term archiving on: : Wednesday, September 29, 2021 - 6:55:52 PM


Files produced by the author(s)


  • HAL Id : hal-03272258, version 1



yohanens Biadgline, Kamel Smaïli. Parallel Corpora Preparation for English-Amharic Machine Translation. IWANN 2021 - International Work on Artificial Neural Networks, Conference Springer LNCS proceedings, Jun 2021, Online, Spain. ⟨hal-03272258⟩



Record views


Files downloads