Query-Oriented Summarization of RDF Graphs

Šejla Čebirić 1, 2, 3 François Goasdoué 4, 1 Ioana Manolescu 1, 2, 3
1 CEDAR - Rich Data Analytics at Cloud Scale
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France
4 SHAMAN - Symbolic and Human-centric view of dAta MANagement
IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : RDF is the data model of choice for Semantic Web applications. RDF graphs are often large and heterogeneous, thus users may have a hard time getting familiar with the structure and semantics of a graph, or determining whether a graph is useful for a certain application. We consider answering such questions by inspecting a graph summary, a compact structure conveying as much information as possible about the input graph. A summary is representative of a graph if it represents both its explicit and implicit triples, the latter resulting from RDF Schema constraints. To ensure represen- tativeness, we define a novel RDF-specific summarization framework based on RDF node equivalence and graph quotients; our framework can be instantiated with many different RDF node equivalence relations. We show that our summaries are representative, and establish a sufficient condition on the RDF equivalence relation to ensure that a graph can be efficiently summarized, without materializing its implicit triples. We demonstrate that the state-of-the art bisimulation equivalence relations between graph nodes fit into our framework. Further, we instantiate the framework through four novel summaries, based on the new concept of property cliques, specifically tailored to cope with highly heterogeneous RDF graphs; we show that they are orders of magnitude more compact than bisimulation summaries. Finally, we show that the bisimulation and two of our clique summaries can be built efficiently so that they represent the explicit and implicit data of the input graph without saturating the graph. The performance benefits of our efficient summarization method is confirmed through a set of experiments.
Document type :
Reports
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal.inria.fr/hal-01325900
Contributor : Ioana Manolescu <>
Submitted on : Wednesday, June 28, 2017 - 9:03:30 AM
Last modification on : Thursday, June 13, 2019 - 11:34:02 AM
Long-term archiving on: Wednesday, January 17, 2018 - 10:12:36 PM

File

RR.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01325900, version 5

Collections

Citation

Šejla Čebirić, François Goasdoué, Ioana Manolescu. Query-Oriented Summarization of RDF Graphs. [Research Report] RR-8920, INRIA Saclay; Université Rennes 1. 2017. ⟨hal-01325900v5⟩

Share

Metrics

Record views

506

Files downloads

267