A Learning Algorithm for Top-Down Tree Transducers

Adrien Boiret 1 Aurélien Lemay 1 Joachim Niehren 1
1 LINKS - Linking Dynamic Data
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : A generalization from string to trees and from languages to transformations is given of the classical result that any regular language can be learned from annotated examples: we show how to learn top-down deterministic tree transducers (DTOPs) in Gold's model with polynomial resources from samples of input-output examples, while assuming a top-down schema for the input domain. Until now, similar results were known only for string transducers, sequential tree-to-word transducers, and simple relabeling tree transducers. Learning of DTOPs is more involved because a DTOP can copy, delete, and permute its input subtrees. Thus, complex dependencies of labeled input to output paths need to be maintained by the algorithm. First, a Myhill-Nerode theorem is presented for DTOPs with top-down inspection, which characterizes the unique normal forms of such machines from Engelfriet, Maneth, Seidl 2009 in a purely semantical manner. This theorem is then used to construct a learning algorithm for DTOPs with top-down inspection. Finally, it is shown how our result can be applied to learn XML transformations reminicent to XSLT programs, under the assumption that the schema for the domain of input trees is a DTD or an XML Schema. For this, a new schema-based encoding of unranked trees by ranked ones is presented. Over such encodings, DTOPs can realize many practically interesting XML transformations which cannot be realized on first-child/next-sibling encodings. This article extends on a paper presented at PODS’2010.
Type de document :
Pré-publication, Document de travail
2017
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01357627
Contributeur : Inria Links <>
Soumis le : vendredi 21 juillet 2017 - 01:13:54
Dernière modification le : mardi 2 octobre 2018 - 15:42:27

Fichier

0.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01357627, version 1

Collections

Citation

Adrien Boiret, Aurélien Lemay, Joachim Niehren. A Learning Algorithm for Top-Down Tree Transducers. 2017. 〈hal-01357627〉

Partager

Métriques

Consultations de la notice

480

Téléchargements de fichiers

74