Building a bilingual dictionary from movie subtitles based on inter-lingual triggers - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2007

Building a bilingual dictionary from movie subtitles based on inter-lingual triggers

Résumé

This paper focuses on two aspects of Machine Translation: parallel corpora and translation model. First, we present a method to automatically build parallel corpora from subtitle files. We use subtitle files gathered from the Internet. This leads to useful data for Subtitling Machine Translation. Our method is based on Dynamic Time Warping. We evaluated this alignment method by comparing it with a sample aligned by hand and we obtained a precision of alignment equal to $0.92$. Second, we use the notion of inter-lingual triggers in order to build from the subtitle parallel corpora multilingual dictionaries and translation tables for machine translation. Inter-lingual triggers allow to detect couple of source and target words from parallel corpora. The Mutual Information measure used to determine inter-lingual triggers allows to hypothesize that a word in the source language is a translation of another word in the target language. We evaluate the obtained dictionary by comparing it to two existing dictionaries. Then, we integrated the obtained translation tables into an entire translation decoding process supplied by Pharaoh. We compared the translation performance using our translation tables with the performance obtained by the Giza++ tool. The results showed that the system tuned for our tables improves the Bleu value by 2.2% compared to the ones obtained by Giza++.
Fichier principal
Vignette du fichier
aslib07.pdf (196.59 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00184421 , version 1 (31-10-2007)

Identifiants

  • HAL Id : inria-00184421 , version 1

Citer

Caroline Lavecchia, Kamel Smaïli, David Langlois. Building a bilingual dictionary from movie subtitles based on inter-lingual triggers. Translating and the Computer, Nov 2007, Londres, United Kingdom. ⟨inria-00184421⟩
223 Consultations
699 Téléchargements

Partager

Gmail Facebook X LinkedIn More