Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task? - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?

Résumé

Cognate prediction is the task of generating, in a given language, the likely cognates of words in a related language, where cognates are words in related languages that have evolved from a common ancestor word. It is a task for which little data exists and which can aid linguists in the discovery of previously undiscovered relations. Previous work has applied machine translation (MT) techniques to this task, based on the tasks' similarities, without, however, studying their numerous differences or optimising architectural choices and hyper-parameters. In this paper, we investigate whether cognate prediction can benefit from insights from low-resource MT. We first compare statistical MT (SMT) and neural MT (NMT) architectures in a bilingual setup. We then study the impact of employing data augmentation techniques commonly seen to give gains in low-resource MT: monolingual pretraining, backtranslation and multilinguality. Our experiments on several Romance languages show that cognate prediction behaves only to a certain extent like a standard lowresource MT task. In particular, MT architectures, both statistical and neural, can be successfully used for the task, but using supplementary monolingual data is not always as beneficial as using additional language data, contrarily to what is observed for MT.
Fichier principal
Vignette du fichier
2021Aug_ACLFindings_Is_Cognate_Prediction_a_Low_Resource_Machine_Translation_Task_.pdf (513.58 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03243380 , version 1 (31-05-2021)
hal-03243380 , version 2 (15-12-2022)

Identifiants

  • HAL Id : hal-03243380 , version 2

Citer

Clémentine Fourrier, Rachel Bawden, Benoît Sagot. Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?. ACL-IJCNLP 2021 - Findings of the Association for Computational Linguistics, Aug 2021, Bangkok, Thailand. ⟨hal-03243380v2⟩
275 Consultations
314 Téléchargements

Partager

Gmail Facebook X LinkedIn More