Hybrid XML Retrieval Revisited - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2004

Hybrid XML Retrieval Revisited

Résumé

The widespread adoption of XML necessitates structureaware systems that can effectively retrieve information from XML document collections. This paper reports on the participation of the RMIT group in the INEX 2004 ad hoc track, where we investigate different aspects of the XML retrieval task. Our preliminary analysis of CO and VCAS relevance assessments identifies three XML retrieval scenarios: Original, General and Specific. Further analysis of the relevance assessments under the General retrieval scenario reveals two categories of CO and VCAS topics: Broad and Narrow. We design runs that follow a hybrid XML approach and implement two retrieval heuristics with different levels of overlap among the answer elements. For the Original retrieval scenario we show that the overlap CO runs outperform the non-overlap CO runs, and the VCAS run that uses queries with structural constraints and no explicitly specified target element performs best. In both CO and VCAS cases, runs that implement the retrieval heuristic that favours less speciffc over more speciffc answer elements produce most effective retrieval.Importantly, we present results which show that, for the General retrieval scenario where users prefer less specific and non-overlapping answers to their queries, the choice of using a plain full-text search engine is a very effective choice for XML retrieval.
Fichier principal
Vignette du fichier
RMIT-paper.pdf (153.97 Ko) Télécharger le fichier
Loading...

Dates et versions

inria-00000003 , version 1 (29-04-2005)

Identifiants

Citer

Jovan Pehcevski, James A. Thom, S. M. M. Tahaghoghi, Anne-Marie Vercoustre. Hybrid XML Retrieval Revisited. Advances in XML Information Retrieval. Third Workshop of the INitiative for the Evaluation of XML Retrieval (INEX), Dec 2004, Schloss Dagstuhl, Germany, ⟨10.1007/11424550_13⟩. ⟨inria-00000003⟩

Collections

INRIA INRIA2
100 Consultations
148 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More