A Learning Algorithm for Top-Down Tree Transducers

Adrien Boiret 1 Aurélien Lemay 1 Joachim Niehren 1
1 LINKS - Linking Dynamic Data
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : A generalization from string to trees and from languages to transformations is given of the classical result that any regular language can be learned from annotated examples: we show how to learn top-down deterministic tree transducers (DTOPs) in Gold's model with polynomial resources from samples of input-output examples, while assuming a top-down schema for the input domain. Until now, similar results were known only for string transducers, sequential tree-to-word transducers, and simple relabeling tree transducers. Learning of DTOPs is more involved because a DTOP can copy, delete, and permute its input subtrees. Thus, complex dependencies of labeled input to output paths need to be maintained by the algorithm. First, a Myhill-Nerode theorem is presented for DTOPs with top-down inspection, which characterizes the unique normal forms of such machines from Engelfriet, Maneth, Seidl 2009 in a purely semantical manner. This theorem is then used to construct a learning algorithm for DTOPs with top-down inspection. Finally, it is shown how our result can be applied to learn XML transformations reminicent to XSLT programs, under the assumption that the schema for the domain of input trees is a DTD or an XML Schema. For this, a new schema-based encoding of unranked trees by ranked ones is presented. Over such encodings, DTOPs can realize many practically interesting XML transformations which cannot be realized on first-child/next-sibling encodings. This article extends on a paper presented at PODS’2010.
Liste complète des métadonnées

Cited literature [40 references]  Display  Hide  Download

https://hal.inria.fr/hal-01357627
Contributor : Inria Links <>
Submitted on : Friday, July 21, 2017 - 1:13:54 AM
Last modification on : Friday, March 22, 2019 - 1:34:27 AM

File

0.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01357627, version 1

Citation

Adrien Boiret, Aurélien Lemay, Joachim Niehren. A Learning Algorithm for Top-Down Tree Transducers. 2017. 〈hal-01357627〉

Share

Metrics

Record views

493

Files downloads

79