Parachute Queries in the Presence of Unavailable Data Sources

Philippe Bonnet Anthony Tomasic 1
1 RODIN - Database Systems
Inria Paris-Rocquencourt
Abstract : Mediator systems are used today in a wide variety of unreliable environments. When processing a query, a mediator may try to access a data source which is unavailable. In this situation, existing systems either silently ignore unavailable data sources or generate an error. In either case, to obtain the complete answer, the query is reprocessed from scratch. This behavior is inefficient in environments with a non-negligible probability that a data source is unavailable (e.g., the Internet). In the case that some data sources are unavailable, the complete answer to a query cannot be obtained; however useful work can be done with the available data sources. In this paper, we describe a novel approach to mediator query processing where, in the presence of unavailable data sources, the answer to a query is a {\em partial answer}. The partial answer represents the state of the mediator at the end of query processing, i.e., materialized data. This state is used to construct an {\em incremental query}. The answer to the incremental query is the same as the complete answer, but it is more efficient to evaluate than the original query. In addition, information can be extracted from the mediator state through the use of secondary queries, called {\em parachute queries}. We describe an intuitive class of parachute queries and an algorithm which generates it.We define two new architectures for partial answers, incremental and parachute queries and analytically model for these architectures the probability of obtaining the answer to query in the presence of unavailable data sources. The analysis shows that complete answers are more likely in our two architectures than in a classical system. We measure the performance of our architectures via simulations and show that, in the case that all data sources are available, the performance penalty for our approach is negligible. In addition, we show that there is a trade-off between the cost of query execution and the probability of obtaining a complete or parachute query answer.
