CliqueSquare: efficient Hadoop-based RDF query processing

François Goasdoué 1 Zoi Kaoudi 1 Ioana Manolescu 1 Jorge Quiané-Ruiz 2 Stamatis Zampetakis 1
1 OAK - Database optimizations and architectures for complex large data
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : Large volumes of RDF data collections are being created, published and used lately in various contexts, from scientific data to domain ontologies and to open government data, in particular in the context of the Linked Data movement. Managing such large volumes of RDF data is challenging due to the sheer size and the heterogeneity. To tackle the size challenge, a single isolated machine is not an efficient solution anymore. The MapReduce paradigm is a promising direction providing scalability and massively parallel processing of large-volume data. We present CliqueSquare, an efficient RDF data management platform based on Hadoop, an open source MapReduce implementation, and its file system, Hadoop Distributed File System (HDFS). CliqueSquare relies on a novel RDF data partitioning scheme enabling queries to be evaluated efficiently, by minimizing both the number of MapReduce jobs and the data transfer between nodes during query execution. We present preliminary experiments comparing our system against HadoopRDF, the state-of-the-art Hadoop-based RDF platform. The results demonstrate the advantages of CliqueSquare not only in terms of query response times, but also in terms of network traffic.
Document type :
Conference papers
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.inria.fr/hal-00867728
Contributor : Stamatis Zampetakis <>
Submitted on : Monday, September 30, 2013 - 2:27:16 PM
Last modification on : Monday, December 9, 2019 - 5:24:07 PM
Long-term archiving on: Friday, April 7, 2017 - 4:31:24 AM

File

top.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-00867728, version 1

Collections

Citation

François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge Quiané-Ruiz, Stamatis Zampetakis. CliqueSquare: efficient Hadoop-based RDF query processing. BDA'13 - Journées de Bases de Données Avancées, Oct 2013, Nantes, France. ⟨hal-00867728⟩

Share

Metrics

Record views

1155

Files downloads

1390