Skip to Main content Skip to Navigation
Conference papers

CliqueSquare: efficient Hadoop-based RDF query processing

François Goasdoué 1 Zoi Kaoudi 1 Ioana Manolescu 1 Jorge Quiané-Ruiz 2 Stamatis Zampetakis 1
1 OAK - Database optimizations and architectures for complex large data
Inria Saclay - Ile de France, LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : Large volumes of RDF data collections are being created, published and used lately in various contexts, from scientific data to domain ontologies and to open government data, in particular in the context of the Linked Data movement. Managing such large volumes of RDF data is challenging due to the sheer size and the heterogeneity. To tackle the size challenge, a single isolated machine is not an efficient solution anymore. The MapReduce paradigm is a promising direction providing scalability and massively parallel processing of large-volume data. We present CliqueSquare, an efficient RDF data management platform based on Hadoop, an open source MapReduce implementation, and its file system, Hadoop Distributed File System (HDFS). CliqueSquare relies on a novel RDF data partitioning scheme enabling queries to be evaluated efficiently, by minimizing both the number of MapReduce jobs and the data transfer between nodes during query execution. We present preliminary experiments comparing our system against HadoopRDF, the state-of-the-art Hadoop-based RDF platform. The results demonstrate the advantages of CliqueSquare not only in terms of query response times, but also in terms of network traffic.
Document type :
Conference papers
Complete list of metadata

Cited literature [33 references]  Display  Hide  Download
Contributor : Stamatis Zampetakis Connect in order to contact the contributor
Submitted on : Monday, September 30, 2013 - 2:27:16 PM
Last modification on : Thursday, July 8, 2021 - 3:48:15 AM
Long-term archiving on: : Friday, April 7, 2017 - 4:31:24 AM


Publisher files allowed on an open archive


  • HAL Id : hal-00867728, version 1



François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge Quiané-Ruiz, Stamatis Zampetakis. CliqueSquare: efficient Hadoop-based RDF query processing. BDA'13 - Journées de Bases de Données Avancées, Oct 2013, Nantes, France. ⟨hal-00867728⟩



Les métriques sont temporairement indisponibles