Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search

Résumé

Adapting keyword search to XML data has been attractive recently, generalized as XML keyword search (XKS). One of its key tasks is to return the meaningful fragments as the result. [1] is the latest work following this trend, and it focuses on returning the fragments rooted at SLCA (Smallest LCA – Lowest Common Ancestor) nodes. To guarantee that the fragments only contain interesting nodes, [1] proposes a contributor-based filtering mechanism in its MaxMatch algorithm. However, the filtering mechanism is not sufficient. It will commit the false positive problem (discarding interesting nodes) and the redundancy problem (keeping uninteresting nodes). In this paper, our interest is to propose a framework of retrieving meaningful fragments rooted at not only the SLCA nodes, but all LCA nodes. We begin by introducing the concept of Relaxed Tightest Fragment (RTF) as the basic result type. Then we propose a new filtering mechanism to overcome those two problems in Max-Match. Its kernel is the concept of valid contributor, which helps to distinguish the interesting children of a node. The new filtering mechanism is then to prune the nodes in a RTF which are not valid contributors to their parents. Based on the valid contributor concept, our ValidRTF algorithm not only overcomes those two problems in MaxMatch, but also satisfies the axiomatic properties deduced in [1] that an XKS technique should satisfy. We compare ValidRTF with MaxMatch on real and synthetic XML data. The result verifies our claims, and shows the effectiveness of our valid-contributor-based filtering mechanism.
Fichier principal
Vignette du fichier
p0815-KONG.pdf (1.08 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00433097 , version 1 (18-11-2009)

Identifiants

  • HAL Id : inria-00433097 , version 1

Citer

Kong Lingbo, Rémi Gilleron, Aurélien Lemay. Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search. EDBT 2009, 2009, Saint Petersburg, Russia. ⟨inria-00433097⟩
213 Consultations
210 Téléchargements

Partager

Gmail Facebook X LinkedIn More