Skip to Main content Skip to Navigation
Conference papers

Building a bilingual dictionary from movie subtitles based on inter-lingual triggers

Caroline Lavecchia 1 Kamel Smaïli 1 David Langlois 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper focuses on two aspects of Machine Translation: parallel corpora and translation model. First, we present a method to automatically build parallel corpora from subtitle files. We use subtitle files gathered from the Internet. This leads to useful data for Subtitling Machine Translation. Our method is based on Dynamic Time Warping. We evaluated this alignment method by comparing it with a sample aligned by hand and we obtained a precision of alignment equal to $0.92$. Second, we use the notion of inter-lingual triggers in order to build from the subtitle parallel corpora multilingual dictionaries and translation tables for machine translation. Inter-lingual triggers allow to detect couple of source and target words from parallel corpora. The Mutual Information measure used to determine inter-lingual triggers allows to hypothesize that a word in the source language is a translation of another word in the target language. We evaluate the obtained dictionary by comparing it to two existing dictionaries. Then, we integrated the obtained translation tables into an entire translation decoding process supplied by Pharaoh. We compared the translation performance using our translation tables with the performance obtained by the Giza++ tool. The results showed that the system tuned for our tables improves the Bleu value by 2.2% compared to the ones obtained by Giza++.
Document type :
Conference papers
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/inria-00184421
Contributor : Caroline Lavecchia <>
Submitted on : Wednesday, October 31, 2007 - 9:37:19 AM
Last modification on : Thursday, January 11, 2018 - 6:19:56 AM
Document(s) archivé(s) le : Monday, April 12, 2010 - 1:03:11 AM

File

aslib07.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00184421, version 1

Collections

Citation

Caroline Lavecchia, Kamel Smaïli, David Langlois. Building a bilingual dictionary from movie subtitles based on inter-lingual triggers. Translating and the Computer, Nov 2007, Londres, United Kingdom. ⟨inria-00184421⟩

Share

Metrics

Record views

413

Files downloads

741