Inferring syntactic rules for word alignment through Inductive Logic Programming

Sylwia Ozdowska 1 Vincent Claveau 2, *
* Auteur correspondant
2 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : This paper presents and evaluates an original approach to automatically align bitexts at the word level. It relies on a syntactic dependency analysis of the source and target texts and is based on a machine-learning technique, namely inductive logic programming (ILP). We show that ILP is particularly well suited for this task in which the data can only be expressed by (translational and syntactic) relations. It allows us to infer easily rules called syntactic alignment rules. These rules make the most of the syntactic information to align words. A simple bootstrapping technique provides the examples needed by ILP, making this machine learning approach entirely automatic. Moreover, through different experiments, we show that this approach requires a very small amount of training data, and its performance rivals some of the best existing alignment systems. Furthermore, cases of syntactic isomorphisms or non-isomorphisms between the source language and the target language are easily identified through the inferred rules.
Type de document :
Communication dans un congrès
ELRA. 7th Language Resources and Evaluation Conference, LREC'10, May 2010, Valletta, Malta. 2010, 〈http://www.lrec-conf.org/proceedings/lrec2010/pdf/878_Paper.pdf〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00561754
Contributeur : Patrick Gros <>
Soumis le : mardi 1 février 2011 - 17:44:22
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10

Identifiants

  • HAL Id : inria-00561754, version 1

Citation

Sylwia Ozdowska, Vincent Claveau. Inferring syntactic rules for word alignment through Inductive Logic Programming. ELRA. 7th Language Resources and Evaluation Conference, LREC'10, May 2010, Valletta, Malta. 2010, 〈http://www.lrec-conf.org/proceedings/lrec2010/pdf/878_Paper.pdf〉. 〈inria-00561754〉

Partager

Métriques

Consultations de la notice

80