Discovering Phrases in Machine Translation by Simulated Annealing

Caroline Lavecchia; David Langlois; Kamel Smaïli

Communication Dans Un Congrès Année : 2008

Discovering Phrases in Machine Translation by Simulated Annealing

(1) , (1) , (1)

Caroline Lavecchia

Fonction : Auteur

Analysis, perception and recognition of speech

David Langlois

Fonction : Auteur
PersonId : 298
IdHAL : david-langlois
IdRef : 070239509

Analysis, perception and recognition of speech

Kamel Smaïli

Fonction : Auteur
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Analysis, perception and recognition of speech

Résumé

In this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source. Then we use inter-lingual triggers in order to retrieve their translat ions. Furthermore, we consider the way of extracting phrase trans- lations as an optimization issue. For that we use simulated annealing algorithm to find out the best phrase translations among all those determined by inter-lingual triggers. The best phrases are those which improve the translation quality in terms of Bleu score. Tests are achieved on the proceedings of the European Parliament corpora. The training is made on a corpus containing 596K parallel sentences (French-English) and tests on a corpus of 1444 sentences. With only 8.1% of the identified source phrases occurring in the test corpus, our system overcomes the baseline model by almost 3 points.

Mots clés

statistical machine translation inter-lingual triggers simulated annealing

Domaines

Informatique et langage [cs.CL]

Fichier principal

CarolineLavecchia.pdf (69.12 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Caroline Lavecchia : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00331327

Soumis le : jeudi 16 octobre 2008-12:46:11

Dernière modification le : vendredi 24 mars 2023-14:52:51

Archivage à long terme le : lundi 7 juin 2010-20:16:25

Dates et versions

inria-00331327 , version 1 (16-10-2008)

Identifiants

HAL Id : inria-00331327 , version 1

Citer

Caroline Lavecchia, David Langlois, Kamel Smaïli. Discovering Phrases in Machine Translation by Simulated Annealing. INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association, Sep 2008, Brisbane, Australia. pp.2354-2357. ⟨inria-00331327⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

141 Consultations

232 Téléchargements

Discovering Phrases in Machine Translation by Simulated Annealing

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager