HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?

Abstract : Cognate prediction is the task of generating, in a given language, the likely cognates of words in a related language, where cognates are words in related languages that have evolved from a common ancestor word. It is a task for which little data exists and which can aid linguists in the discovery of previously undiscovered relations. Previous work has applied machine translation (MT) techniques to this task, based on the tasks' similarities, without, however, studying their numerous differences or optimising architectural choices and hyper-parameters. In this paper, we investigate whether cognate prediction can benefit from insights from low-resource MT. We first compare statistical MT (SMT) and neural MT (NMT) architectures in a bilingual setup. We then study the impact of employing data augmentation techniques commonly seen to give gains in low-resource MT: monolingual pretraining, backtranslation and multilinguality. Our experiments on several Romance languages show that cognate prediction behaves only to a certain extent like a standard lowresource MT task. In particular, MT architectures, both statistical and neural, can be successfully used for the task, but using supplementary monolingual data is not always as beneficial as using additional language data, contrarily to what is observed for MT.
Document type :
Conference papers
Complete list of metadata

Contributor : Benoît Sagot Connect in order to contact the contributor
Submitted on : Monday, May 31, 2021 - 3:37:52 PM
Last modification on : Friday, February 4, 2022 - 3:07:51 AM


  • HAL Id : hal-03243380, version 1



Clémentine Fourrier, Rachel Bawden, Benoît Sagot. Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug 2021, Bangkok, Thailand. ⟨hal-03243380⟩



Record views


Files downloads