Automatic Selection of Parallel Data for Machine Translation

Despoina Mouratidis; Katia Lida Kermanidis

doi:10.1007/978-3-319-92016-0_14

Communication Dans Un Congrès Année : 2018

Automatic Selection of Parallel Data for Machine Translation

(1) , (1)

Despoina Mouratidis

Fonction : Auteur
PersonId : 1033567

Ionian University [Corfu]

Katia Lida Kermanidis

Fonction : Auteur
PersonId : 992337

Ionian University [Corfu]

Résumé

Nowadays machine translation is widely used, but the required data for training, tuning and testing a machine translation engine is often not sufficient or not useful. The automatic selection of data that are qualitatively appropriate for building translation models can help improve translation accuracy. In this paper, we used a large parallel corpus of educational video lecture subtitles as well as text posted by students and lecturers on the course fora. The text is quite challenging to translate due to the scientific domains involved and its informal genre. We applied a random forest classification schema on the output of three machine translation models (one based on statistical machine translation and two on neural machine translation) in order to automatically identify the best output. The unorthodox language phenomena observed as well as the rich-in-terminology scientific domains addressed in the educational video lectures, the language-independent nature of the approach, and the tackled three-class classification problem constitute innovative challenges of the work described herein.

Mots clés

Machine learning Educational data Data selection Machine translation Random forests

Domaines

Informatique [cs]

Fichier principal

468652_1_En_14_Chapter.pdf (349.1 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01821299

Soumis le : vendredi 22 juin 2018-14:12:51

Dernière modification le : vendredi 22 juin 2018-14:24:16

Archivage à long terme le : lundi 24 septembre 2018-11:36:51

Dates et versions

hal-01821299 , version 1 (22-06-2018)

Licence

Paternité

Identifiants

HAL Id : hal-01821299 , version 1
DOI : 10.1007/978-3-319-92016-0_14

Citer

Despoina Mouratidis, Katia Lida Kermanidis. Automatic Selection of Parallel Data for Machine Translation. 14th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), May 2018, Rhodes, Greece. pp.146-156, ⟨10.1007/978-3-319-92016-0_14⟩. ⟨hal-01821299⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP IFIP-AICT IFIP-TC IFIP-WG IFIP-TC12 IFIP-AIAI IFIP-WG12-5 IFIP-AICT-520

74 Consultations

74 Téléchargements

Automatic Selection of Parallel Data for Machine Translation

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager