Parallel Corpora Preparation for English-Amharic Machine Translation

Yohanens Biadgline; Kamel Smaïli

Communication Dans Un Congrès Année : 2021

Parallel Corpora Preparation for English-Amharic Machine Translation

(1) , (2)

1
2

Yohanens Biadgline

Fonction : Auteur

Bahir Dar Institute of Technology

Kamel Smaïli

Fonction : Auteur
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Statistical Machine Translation and Speech Modelization and Text

Résumé

In this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) experiments. The performance using the bilingual evaluation understudy metric (BLEU) shows 26.47 and 32.44 respectively for SMT and NMT. The corpus was collected from the Internet using automatic and semi automatic techniques. The harvested corpus concerns domains coming from Religion, Law, and News. Finally, the corpus, we built is composed of 225,304 parallel sentences, it will be shared for free with the community. In our knowledge, this is the biggest parallel corpus so far concerning the Amharic language.

Mots clés

Amharic language Machine Translation SMT NMT Parallel Corpus BLEU

Domaines

Informatique et langage [cs.CL]

Fichier principal

IWAAN_Paper_Final.pdf (386.58 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Kamel Smaïli : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03272258

Soumis le : lundi 28 juin 2021-11:22:07

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : mercredi 29 septembre 2021-18:55:52

Dates et versions

hal-03272258 , version 1 (28-06-2021)

Identifiants

HAL Id : hal-03272258 , version 1

Citer

Yohanens Biadgline, Kamel Smaïli. Parallel Corpora Preparation for English-Amharic Machine Translation. IWANN 2021 - International Work on Artificial Neural Networks, Conference Springer LNCS proceedings, Jun 2021, Online, Spain. ⟨hal-03272258⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD

131 Consultations

476 Téléchargements

Parallel Corpora Preparation for English-Amharic Machine Translation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager