Parallel Corpora Preparation for English-Amharic Machine Translation - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Parallel Corpora Preparation for English-Amharic Machine Translation

Résumé

In this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) experiments. The performance using the bilingual evaluation understudy metric (BLEU) shows 26.47 and 32.44 respectively for SMT and NMT. The corpus was collected from the Internet using automatic and semi automatic techniques. The harvested corpus concerns domains coming from Religion, Law, and News. Finally, the corpus, we built is composed of 225,304 parallel sentences, it will be shared for free with the community. In our knowledge, this is the biggest parallel corpus so far concerning the Amharic language.
Fichier principal
Vignette du fichier
IWAAN_Paper_Final.pdf (386.58 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03272258 , version 1 (28-06-2021)

Identifiants

  • HAL Id : hal-03272258 , version 1

Citer

Yohanens Biadgline, Kamel Smaïli. Parallel Corpora Preparation for English-Amharic Machine Translation. IWANN 2021 - International Work on Artificial Neural Networks, Conference Springer LNCS proceedings, Jun 2021, Online, Spain. ⟨hal-03272258⟩
131 Consultations
476 Téléchargements

Partager

Gmail Facebook X LinkedIn More