Skip to Main content Skip to Navigation

A flexible and decentralised approach to query processing for geo-distributed data systems

Abstract : Query processing is an essential component of today's data serving systems. Query processing involves a variety of metrics that are in tension and create trade-offs. Because of these trade-offs, application developers need to tune query engines to the characteristics and needs of each application. Today's query engines often handle requests from users around the world, accessing data spread across geographically distributed sites. This thesis studies how to support efficient query processing in contexts in which users and data are distributed across multiple geographic locations. We present an analysis of the design decision and trade-offs in geo-distributed query processing. In particular, we study how the placement of derived state used by the query engine to accelerate query processing (indexes, materialized views) and the communication patterns involved in query processing and state maintenance affect three metrics: query performance, query result freshness, and cross-site network resource consumption. We propose a query engine architecture that, as opposed to current state-of-the-art approaches, allows application developers to make derived state placement decisions in a case-by-case basis. The enabling technique that this thesis presents is composition-based design: a query engine architecture can be constructed by composing building block components that encapsulate primitive query processing tasks into a directed acyclic graph that provides higher-order query processing capabilities. We introduce a query processing component abstraction, the Query Processing Unit (QPU), that defines a uniform interface and interaction semantics for query processing architecture building blocks. This uniform interface and interaction semantics allows us to expose design decisions about the query engine’s architecture and placement to application developers. Finally, we present an implementation of the proposed approach, in the form of a framework for constructing and deployment application-specific query engines, called Proteus. Proteus consists of an extensible library of Query Processing Unit implementations, and mechanisms for facilitating the definition and deployment of QPU-based query engines. The experimental evaluation supports the theoretical analysis of the trade-offs involved in query processing state placement, and suggests that Proteus can effectively occupy multiple different points in the design space of geo-distributed query processing.
Complete list of metadata
Contributor : Dimitrios Vasilas Connect in order to contact the contributor
Submitted on : Monday, September 13, 2021 - 6:27:25 PM
Last modification on : Friday, October 22, 2021 - 4:55:20 AM


Files produced by the author(s)


  • HAL Id : tel-03272208, version 1


Dimitrios Vasilas. A flexible and decentralised approach to query processing for geo-distributed data systems. Computer science. Sorbonne Université, 2021. English. ⟨tel-03272208v1⟩



Record views


Files downloads