Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

A Learning Algorithm for Top-Down Tree Transducers

Adrien Boiret 1 Aurélien Lemay 1 Joachim Niehren 1
1 LINKS - Linking Dynamic Data
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : A generalization from string to trees and from languages to transformations is given of the classical result that any regular language can be learned from annotated examples: we show how to learn top-down deterministic tree transducers (DTOPs) in Gold's model with polynomial resources from samples of input-output examples, while assuming a top-down schema for the input domain. Until now, similar results were known only for string transducers, sequential tree-to-word transducers, and simple relabeling tree transducers. Learning of DTOPs is more involved because a DTOP can copy, delete, and permute its input subtrees. Thus, complex dependencies of labeled input to output paths need to be maintained by the algorithm. First, a Myhill-Nerode theorem is presented for DTOPs with top-down inspection, which characterizes the unique normal forms of such machines from Engelfriet, Maneth, Seidl 2009 in a purely semantical manner. This theorem is then used to construct a learning algorithm for DTOPs with top-down inspection. Finally, it is shown how our result can be applied to learn XML transformations reminicent to XSLT programs, under the assumption that the schema for the domain of input trees is a DTD or an XML Schema. For this, a new schema-based encoding of unranked trees by ranked ones is presented. Over such encodings, DTOPs can realize many practically interesting XML transformations which cannot be realized on first-child/next-sibling encodings. This article extends on a paper presented at PODS’2010.
Complete list of metadata

Cited literature [40 references]  Display  Hide  Download
Contributor : Inria Links Connect in order to contact the contributor
Submitted on : Friday, July 21, 2017 - 1:13:54 AM
Last modification on : Wednesday, March 23, 2022 - 3:51:22 PM


Files produced by the author(s)


  • HAL Id : hal-01357627, version 1


Adrien Boiret, Aurélien Lemay, Joachim Niehren. A Learning Algorithm for Top-Down Tree Transducers. 2017. ⟨hal-01357627⟩



Record views


Files downloads