Finding optimal probabilistic generators for XML collections

Serge Abiteboul 1, 2 Yael Amsterdamer 3 Daniel Deutch 2 Tova Milo 4 Pierre Senellart 3
1 DAHU - Verification in databases
LSV - Laboratoire Spécification et Vérification [Cachan], ENS Cachan - École normale supérieure - Cachan, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8643
Abstract : We study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the like- lihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic model, in the absence of constraints. We further study the problem in the presence of integrity constraints, namely key, inclusion, and domain constraints. We study in this case two different kinds of generators. First, we consider a continuation-test generator that performs, while generating documents, tests of schema satisfiability; these tests prevent from generating a document violating the constraints but, as we will see, they are computationally expensive. We also study a restart generator that may generate an invalid document and, when this is the case, restarts and tries again. Finally, we consider the injection of data values into the structure, to obtain a full XML document. We study different approaches for generating these values.
Type de document :
Communication dans un congrès
ICDT, Mar 2012, Berlin, Germany. pp.127-139, 2012
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00765545
Contributeur : Émilien Antoine <>
Soumis le : vendredi 14 décembre 2012 - 17:46:32
Dernière modification le : samedi 3 mars 2018 - 15:12:01
Document(s) archivé(s) le : dimanche 18 décembre 2016 - 02:17:49

Fichier

abiteboul2012finding.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00765545, version 1

Collections

Citation

Serge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, Pierre Senellart. Finding optimal probabilistic generators for XML collections. ICDT, Mar 2012, Berlin, Germany. pp.127-139, 2012. 〈hal-00765545〉

Partager

Métriques

Consultations de la notice

361

Téléchargements de fichiers

109