Skip to Main content Skip to Navigation
Conference papers

Building a bilingual dictionary from movie subtitles based on inter-lingual triggers

Caroline Lavecchia 1 Kamel Smaïli 1 David Langlois 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper focuses on two aspects of Machine Translation: parallel corpora and translation model. First, we present a method to automatically build parallel corpora from subtitle files. We use subtitle files gathered from the Internet. This leads to useful data for Subtitling Machine Translation. Our method is based on Dynamic Time Warping. We evaluated this alignment method by comparing it with a sample aligned by hand and we obtained a precision of alignment equal to $0.92$. Second, we use the notion of inter-lingual triggers in order to build from the subtitle parallel corpora multilingual dictionaries and translation tables for machine translation. Inter-lingual triggers allow to detect couple of source and target words from parallel corpora. The Mutual Information measure used to determine inter-lingual triggers allows to hypothesize that a word in the source language is a translation of another word in the target language. We evaluate the obtained dictionary by comparing it to two existing dictionaries. Then, we integrated the obtained translation tables into an entire translation decoding process supplied by Pharaoh. We compared the translation performance using our translation tables with the performance obtained by the Giza++ tool. The results showed that the system tuned for our tables improves the Bleu value by 2.2% compared to the ones obtained by Giza++.
Document type :
Conference papers
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download
Contributor : Caroline Lavecchia Connect in order to contact the contributor
Submitted on : Wednesday, October 31, 2007 - 9:37:19 AM
Last modification on : Wednesday, February 2, 2022 - 3:51:17 PM
Long-term archiving on: : Monday, April 12, 2010 - 1:03:11 AM


Files produced by the author(s)


  • HAL Id : inria-00184421, version 1



Caroline Lavecchia, Kamel Smaïli, David Langlois. Building a bilingual dictionary from movie subtitles based on inter-lingual triggers. Translating and the Computer, Nov 2007, Londres, United Kingdom. ⟨inria-00184421⟩



Record views


Files downloads