from Keyword Queries

. We consider the problem of a client performing privacy-preserving range queries to a server’s database. We propose a cryptographic model for the study of such protocols, by expanding previous well-studied models of keyword search and private information retrieval to the range query type and to incorporate a multiple-occurrence attribute column in the database table. Our ﬁrst two results are 2-party privacy-preserving range query protocols, where either (a) the value domain is linear in the number of database records and the database size is only increased by a small constant factor; or (b) the value domain is exponential (thus, essentially of arbitrarily large size) in the number of database records and the database size is increased by a factor logarithmic in the value domain size. Like all previous work in private information retrieval and keyword search, this protocol still satisﬁes server time complexity linear in the number of database payloads. We discuss how to adapt these results to a 3-party model where encrypted data is outsourced to a third party (i.e., a cloud server). The result is a private database retrieval protocol satisfying a highly desirable tradeoﬀ of privacy and eﬃciency properties; most notably: (1) no unintended information is leaked to clients or servers, and the information leaked to the third party is characterized as ‘access pattern’ on encrypted data; (2) for each query, all parties run in time only logarithmic in the number of database records and linear in the answer size; (3) the protocol’s query runtime is practical for real-life applications.


Introduction
The recent computing trend of outsourcing big data in the cloud for simplified and efficient application deployment is being embraced in government, as well as other areas, including finance, information technology, etc. In government, large databases are needed in many contexts (e.g., no-fly lists, metadata of communication records, etc.). In finance, banks and other financial institutions need to store huge data volumes and compute over them on a daily basis. In information technology, web and social networks collect huge data from computer users, which is then made available for different uses and computations. To facilitate and guarantee success for all of these applications, databases are very useful data management tools, and cloud storage and computing provide tremendous efficiency and utility for users, as exemplified by the increasingly successful database-as-aservices application paradigm (see, e.g., [13]). On the other hand, cloud storage and computing paradigms are also accompanied by privacy risks (see, e.g., [21]). To mitigate these risks, database-management systems can use privacy-preserving database retrieval protocols that allow users to submit queries and receive results in a way that clients learn nothing about the contents of a database except the results of their queries, and servers do not learn which queries are submitted. The research literature has attempted to address these issues, by studying private database retrieval protocols in limited database and query models and with limited efficiency properties. In this paper we partially address some of these limitations, by using a practical database model, and proposing protocols in both a client-server model and a 3-party model, where servers can outsource data to a third party (in encrypted form). In these models, practical and privacy-preserving database retrieval protocols for basic query types such as keyword queries, have been recently shown to be possible. In this paper, we attempt to show that practical and privacy-preserving database retrieval protocols are possible for a more complex query type: range queries.
Previous Work. The security and cryptography literature contains a significant amount of research in the private information retrieval (PIR) [6,18,20] and keyword search (KS) [3,5,10] areas. Both areas consider rather theoretical data models, as we now discuss. In PIR, a database is modeled as a string of n bits, and the query value is an index i ∈ {1, . . . , n}. In KS, early data models were also somewhat restrictive; for instance, [10] only admitted a single matching record per query. The inefficiency of the server runtime in PIR and KS protocols has been well documented (see, e.g., [24]). Some results attempted to use a third party and make the PIR query subprotocol more efficient but require a practically inefficient preprocessing phase [8]. Recently, however, some results on provably privacy-preserving and practical keyword queries in a practical database model and in an outsourced-data scenario were concurrently shown by [7,17], where significant efficiency is achieved by provably limiting the privacy loss to encrypted data "access-pattern" information, only leaked to the cloud server.
The literature also contains a significant amount of work on range queries or range computations on encrypted data. Some papers (starting with [4,22]) focus on encrypting messages, on which one can later perform range query computations. These approaches offer interesting provable security properties but make heavy use of asymmetric cryptography techniques and seem hard to translate into practical protocols for databases. Promising approaches to achieve at least some limited amount of privacy (with tradeoffs against efficiency) on range queries in an outsourced database setting have also been shown (see, e.g., [14] and follow-up work), typically based on variants of "bucketization" approaches. The primitive of order-preserving encryption gives rise to elegant and efficient range query protocols in the "database-as-a-service" model (see, e.g., [1] and follow-up work), but constructions of order-preserving encryption are still not very efficient and especially come with static leakage on the encrypted data to the server holding it [2]. Overall, the question of designing provably privacypreserving range queries in a practical data model, even in the outsourced-data scenario, seems to still deserve more attention from the security community.
Our Contribution. We study range queries in a more practical (outsourced or not) database model, capturing record payloads, possibly equal attribute values across different database records, and multiple answers to a given query. In this model, we define suitable correctness, privacy and efficiency requirements.
We then design two range query protocols in the 2-party model, which satisfy desired privacy properties (i.e., the server learns no information about the query range other than the number of matching records, and the client learns no information about the database other than matching database records) in our data model. Our first protocol works for linear-size value domains by only increasing database size by a small constant, and our second protocol works for exponential-size (thus, essentially arbitrary-size) value domains while increasing database size by a factor logarithmic in the value domain size. These protocols are constructed directly from any KS protocol and, like previous PIR and KS protocols, have server time complexity linear in the database size, a drawback dealt with in our next result.
Our third protocol transforms any of our 2-party range query protocols into a 3-party protocol, where the third party can be a cloud server, based on any 3-party KS protocol (like the one in [7], only based on any pseudo-random function, implemented as a block cipher). In this protocol, both server and third party run queries in logarithmic time and the following privacy properties provably hold: the server learns nothing about the query range, the client learns nothing about the database in addition to the matching database records, and the third party learns nothing about the query range or the database content, other than the repeating of queries from the client and repeated access to the encrypted data structures received by the server at initialization. This solves the problem of achieving provable privacy (against a semi-honest adversary) and efficient server runtime at the cost of a 'third-party'-server and some leakage to the third party characterized as 'access-pattern' to encrypted data. We stress that this protocol has efficient running time not only in an asymptotic sense, but in a sense that makes it ready for real-life applications (where such form of leakage to the third party is tolerable). In our implementation of a computationally similar protocol, we reached our main performance goal of achieving response time to be less than 1 order of magnitude slower than commercial non-private protocols like MySQL. Our protocol solves a number of technical challenges using simple and practical techniques, including a reduction step via an intermediate rank database and a 'lazy' database value shifting approach. The privacy loss traded for such a practicality property was already studied in [15,16], who also proposed simple techniques to mitigate leakage to the cloud server in the form of 'access-pattern' to encrypted data, at least in the case of keyword queries. (Here, note that in the presence of such leakage, neither the client nor the server learn anything new, and the cloud server does not statically learn anything about the plain database content). We believe that one appropriate mitigation technique needed for such solutions could be based on Oblivious RAM (an active area started in [12]), and it is plausible that dedicated Oblivious RAM techniques in the 3-party model may nullify or mitigate any such leakage based on 'accesspattern' over encrypted data. This is indeed a promising direction as, while years ago Oblivious RAM was considered inefficient, recent advances (see, e.g., [23]) have made it significantly less inefficient. In all our protocols, we only consider privacy against a semi-honest adversary corrupting at most one party (i.e., an adversary that follows the protocol and then attempts to violate the privacy of one of the parties).

Models and Requirements
Data and Query Models. We model a database as an n-row, 2-column matrix D = (A 1 , A 2 ), where each column is associated with an attribute, denoted as A j , for j = 1, 2, and each entry is denoted as A j (i). The first column is a value attribute, where entries are values in a domain Dom with a total order ≤, and the last column A 2 , is a payload attribute, where entries can be arbitrary binary strings. The database schema, assumed to be publicly known to all parties, includes parameter n, the security parameter, and the description of the attribute value domain. A database row is also called record, and is assumed to have the same length r (if data is not already in this form, techniques from [9] are used to efficiently achieve this property), where r is constant with respect to n.
A query q is modeled to contain one or more query values from the relative attribute domains. We mainly consider Range queries, defined as: where v 0 , v 1 are the query values. A valid response (to a range query) consists of all payloads A 2 (i), for i ∈ [1, n], such that A 1 (i) ∈ [v 0 , v 1 ], and we say that these payloads (or records) match the query. We also discuss KS queries, defined as: where v is the query value. A valid response (to a keyword query) consists of all payloads Participant Models. We consider the following efficient (i.e., running in probabilistic polynomial-time in a common security parameter 1 σ ) participants. The client is the party, denoted as C, that is interested in retrieving data from the database. The server is the party, denoted as S, holding the database (in the clear), and is interested in allowing clients to retrieve data. The third party, denoted as T P , helps the client to carry out the database retrieval functionality and the server to satisfy efficiency requirements during the associated protocol.  Range Query Protocols. In the above data, query, and participant models, we consider a (static-data) range query (briefly, RQ) protocol that extends the KS protocol, as defined in [10] (in turn, an evolution of the PIR protocol, as defined in [18]), in that it considers range queries instead of keyword queries, and it allows the attribute column to have multiple occurrences of the same value. (We can also extend the model so to incorporate databases that contain multiple attributes). Specifically, we define an RQ protocol as a pair (Init, Query) of subprotocols, as follows. The initialization subprotocol Init is used to set up data structures and cryptographic keys before C's queries are executed. The query subprotocol Query allows C to make a single query to retrieve (possibly multiple) matching database records. We also define an RQ protocol execution as a sequence of executions of subprotocols (Init, Query 1 , . . . , Query q ), for some q polynomial in the security parameter, and all subprotocols are run on inputs provided by the involved parties (i.e., a database from S and query values from C). We would like to build RQ protocols that satisfy the following (informal) list of requirements: 1. Correctness: the RQ protocol allows a client to obtain all payloads from the current database associated with records that match its issued query; more specifically, for any RQ protocol execution, and any inputs provided by the participants, in any execution of a Query subprotocol, the probability that C obtains all records in the current database that match C's query value input to this subprotocol, is 1. 2. Privacy: informally speaking, the RQ protocol preserves privacy of database content and query values, ideally only revealing what is leaked by system parameters known to all parties and by the intended functionality output (i.e., all payloads in matching records to C); more specifically, we require the subprotocols in an RQ protocol execution to not leak information beyond the following • Init: all system parameters, including the database schema and a security parameter, will be known to all participants; in the 3-party model, an additional string eds (for encrypted data structures) will be known to T P , will be encrypted under one or more keys unknown to T P and its length is known from quantities in the database schema; • Query, based on query range qr = [v 0 , v 1 ] and the database D: . . , i(m(qr)), will be obtained by C, as a consequence of the correctness requirement; in the 2-party model, the value m(qr) will be known to S; in the 3-party model, the value m(qr), all bits in eds read by T P according to the instructions in the Query protocol, and which previous executions of Query used the same query value v, will be known to T P . 3. Efficiency: the protocol should have low time, communication and round complexity, as a function of system parameters, including the number n of database records.
Given the characterization of intended leakage in the above privacy definition, a formal privacy definition can be derived using known definition techniques from simulation-based security and composable security frameworks often used in the cryptography literature. Similarly as noted for keyword queries in [7], we observe that the communication exchanged in each execution of any subprotocol Query has to leak an upper bound on the value m(qr), i.e., the number of matching records, to S in the 2-party model, and to the coalition of T P and S in the 3-party model. Accordingly, we target the design of protocols that may leak m(qr) to S in the 2-party model. In the 3-party model, different RQ protocols could leak m(qr) only to S, or only to T P , or somehow split this leakage between S and T P . Having to choose between one of these options, we made the practical consideration that privacy against S (i.e., the data owner) is typically of greater interest than privacy against T P (i.e., the cloud server helping C retrieve data from S) in many applications, and therefore we focused in this paper on seeking protocols that leak m(qr) to T P and nothing at all to S. Moreover, in the 3-party model, we made a definitional choice of leaking patterns of repeated access to encrypted data to T P ; this is not due to a theoretical limitation, but seems a well-characterized privacy leakage, which, depending on the application at hand, either is a small price to pay towards achieving very efficient time-complexity requirements on S and T P , or can be reduced by using separate techniques.
With respect to efficiency, although we design protocols with low time, round and communication complexity, we focus our discussions on the communication complexity of the query subprotocols, and on the running time of S in the 2-party model and of S and T P in the 3-party model.

Background: Keyword Search Protocols.
A random function R is a function that is chosen with distribution uniform across all possible functions with some pre-defined input and output domains. A keyed function F (k, ·) is a pseudorandom function (PRF, first defined in [11]) if, after key k is randomly chosen, no efficient algorithm allowed to query an oracle function O can distinguish whether O is F (k, ·) or O is a random function R (over the same input and output domain), with probability greater than 1/2 plus a negligible quantity. A KS protocol is a protocol between two parties A, having as input a keyword v ∈ {1, . . . , n}, and B, having as input a 2-column database represented as D = (A 1 , A 2 ). The protocol consists in a private retrieval of the value(s) A 2 (i) such that A 1 (i) = v, returned to A (thus, without revealing any information about i to B or about A 2 (1), . . . , A 2 (i − 1), A 2 (i + 1), . . . , A 2 (n) to A). Several KS protocols have been presented in the cryptographic literature, starting with [18], using number-theoretic hardness assumptions (see also [5,7,10]).

Range Queries in the Two-Party Model
We describe two RQ protocols for range queries in this model: the first protocol, presented in Sect. 3.1, works for ranges with elements in any linear-size domain; the second protocol, presented in Sect. 3.2, works for ranges with elements in any exponential-size (in practice, arbitrarily large) domain.

A Range Query Protocol for Linear-Size Domains
Our first 2-party RQ protocol considers range values in linear-size domains (that is, where the domain size is equal to the number of database records). This protocol follows the general structure outlined in Fig. 1 and satisfies the following We prove Theorem 1 by describing RQ protocol π 1 and its properties.
The RQ Protocol π 1 : Basic Definitions. Let Dom be a value domain with a total order ≤ defined on it. We say that Dom is a linear-size domain if it holds that |Dom| ≤ n. Given a list U of (not necessarily distinct) values u 1 , . . . , u n ∈ Dom, we say that a value v ∈ Dom has lower U -rank r, also denoted as Lrank(U, v) = r, if there are r values strictly smaller than v. We say that a value v ∈ Dom has upper U -rank r, also denoted as Urank(U, v) = r, if there are n − r values in U strictly larger than v. Let sU = (u h(0) , . . . , u h(n−1) ) denote the list obtained from U by sorting its n elements. These definitions directly imply the following: The RQ Protocol π 1 : An Informal Description. A first approach in our protocol goes as follows. At initialization S splits database D into two databases: a rank database rD and a payload database pD. At query time, C asks S for the lower rank of v 0 and the upper rank of v 1 , where [v 0 , v 1 ] denotes the range queried by C. Because in this protocol we consider only linear-size value domains, S can store at initialization the lower rank and the upper rank of each value in the domain in rD; thus, it suffices C to perform a keyword query to rD to retrieve the two upper and lower rank values. Given these retrieved values, C can compute how many attribute values (if any) are in [v 0 , v 1 ] (i.e., the upper rank minus the lower rank), and then perform as many keyword queries in pD to retrieve the records matching the queried range. As written so far, the protocol satisfies our desired correctness and efficiency properties, but not the privacy property, as C learns the two rank values associated with the queried range's endpoints. We fix this problem by requiring S to randomize the rank values by a random shift of the attribute values, a variation of an idea first used in [8] to improve the efficiency of keyword queries in a 3-party model. Thus, the ranks received by C will be randomly distributed, conditioned by the fact that the difference between them remains the same, and C is entitled to know this difference because of the correctness requirement.
Properties of π 1 . We now show that π 1 satisfies the correctness, privacy and efficiency properties defined in the 2-party model.
Correctness. First of all, note that by the test in step 1, we can assume that v 0 ≤ v 1 , which implies that Lrank(U, i(0)) ≤ Urank(U, i(1)).
By the correctness property of the KS protocol π 0 , at the end of step 5 of Query 1 , C can compute the shifted lower rank Lr (U, i(0)) of v 0 and the shifted upper rank Ur (U, i(1)) of v 1 . As both values are obtained as a shift, by the same random number s, of Lrank(U, i(0)) and Urank(U, i(1)), respectively, it holds that Lr (U, i(0)) = Ur (U, i(1)) if and only if Lrank(U, i(0)) = Urank(U, i(1)). Using item 1 of Fact 1, this implies that if U ∩ [v 0 , v 1 ] = ∅, it will hold that Lrank(U, i(0)) = Urank(U, i(1)) and thus Lr (U, i(0)) = Ur (U, i(1)), and then C will halt in step 6 of Query 1 , without receiving any payload from S. On the other hand, if U ∩ [v 0 , v 1 ] = ∅, at the end of step 9 of Query 1 , by the correctness property of the KS protocol π 0 , C computes the payload pA 2 (i(j)) such that pA 1 (i(j)) = j, for all j = Lr (U, i(0)), . . . , Ur (U, i(1)) − 1, possibly cycling from n − 1 to 0. Using item 2 of Fact 1, this implies that S receives all payloads corresponding to values Privacy. We show that π 1 satisfies our privacy requirement when the adversary corrupts any one among S or C.
When the adversary corrupts S, privacy (i.e., corrupting S does not provide the adversary any new information about C's range query [v 0 , v 1 ] other than system parameters and the number of matching payloads) can be proved by using the analogue privacy property of the KS protocol π 0 . First of all, we observe that Query 1 in protocol π 1 consists of 1 execution of Query 0 followed by either no further execution of Query 0 (resulting in no payload received by C) or by m(qr) = Urank(U, v 1 ) − Lrank(U, v 0 ) additional executions of Query 0 (resulting in m(qr) > 0 payloads received by C). Thus, given the number m(qr) ≥ 0 of payloads received by C, an efficient simulator for the view obtained by S is obtained by suitably calling the efficient simulator for the view by S in the KS protocol π 0 .
When the adversary corrupts C, privacy (i.e., corrupting C does not provide the adversary with any information about S's database D other than system parameters and what intended by the correctness requirement) can be proved by using the analogue privacy property of protocol π 0 . Here, the proof is similar to the previous case: given the number m(qr) ≥ 0 of payloads received by C, a simulator for C's view is obtained by suitably calling the simulator for C's view in the KS protocol π 0 .
Efficiency. As Query 1 essentially consists of running m(qr) + 1 times Query 0 , the communication complexity (resp., S-time complexity) of Query 1 is O(m(qr)) times the communication complexity (resp., S-time complexity) of Query 0 . Thus, the communication complexity is desirably linear in the number of matching records (and can be sub-linear in the number n of total database records). Analogously, the S-time complexity of Query 1 is O(m(qr)) times the S-time complexity of Query 0 plus O(n). Here, note that the S-time complexity of Query 1 is linear in n already for small values of m(qr) as so is the S-time complexity of Query 0 . This inefficiency is a major and known drawback of all 2-party model solutions for protocols like PIR, KS, and therefore, of protocols π 0 and π 1 . Indeed, this motivated our study of RQ protocols in the 3-party model in Sect. 4.

A Range Query Protocol for Exponential-Size Domains
Our second 2-party RQ protocol considers range values in exponential-size (which means, practically speaking, arbitrarily large) domains. This protocol follows the general structure outlined in Fig. 1 and satisfies the following We prove Theorem 2 by describing RQ protocol π 2 and its properties.
The RQ Protocol π 2 : Basic Definitions. Let Dom be a value domain with a total order ≤ defined on it. We say that Dom is an exponential-size domain if it holds that |Dom| ≤ 2 d ≤ 2 p(n) , for some polynomial p. For simplicity, we restrict to the case Dom is the d-dimensional hypercube, i.e. Dom = [0, 2 d − 1], but note that our results can be extended to any exponential-size domain. We define the set cI(Dom) of canonical intervals for Dom by the following recursion: first, add Dom into cI(Dom); then, split Dom into Dom 0 , containing the first half of its elements, and Dom 1 containing the second half; then, for i = 0, 1, generate cI(Dom 0 ), the set of canonical intervals for Dom i ; finally, add cI(Dom 0 ), cI(Dom 1 ) to cI(Dom).
An interval [a, b] ⊆ Dom is a border interval in Dom if there exists an interval I ∈ cI(Dom) such that either a is the first element in I or b is the last element in I. The following fact directly follows by the above definitions of border and canonical intervals.  We note that results similar to Fact 3 have already been studied in other papers (see, e.g., [19]), but we could not find range query protocols based on them with provable privacy properties.
The RQ Protocol π 2 . We would like to construct π 2 = (Init 2 , Query 2 ) as an extension of π 1 = (Init 1 , Query 1 ), based on the above notions of canonical intervals, and interval covering.
At initialization S again splits database D into two databases: a rank database rD and a payload database pD. This time, however, since we consider exponential-size value domains (as opposed to linear-size value domains used for π 1 ), S cannot store at initialization the lower rank and the upper rank of each domain value in rD. Then, instead of storing all domain elements in rD, we store all attribute values u 1 , . . . , u n in D and, for each interval [u i−1 + 1, u i − 1], we consider the set of canonical intervals covering it, as guaranteed by Fact 3, and store each one of these intervals in rD. Note that in each of these latter intervals, each domain value has the same lower and upper ranks, so we only need to store a single copy of these two values in rD as well. Thus, in rD = (rD 1 , rD 2 ), the column rD 1 contains the following After this modification, the remaining computations in Init 2 , including of the lower/upper ranks, continue as in π 1 . Because of Fact 3, this modified initialization at most increases the size of rD by a multiplicative factor of 2(d − 1).
At query time, denoting as [v 0 , v 1 ] the range queried by C, the computation of the shifted lower/upper ranks continue as in Query 1 . However, C not only asks S for the lower rank of v 0 and the upper rank of v 1 by 2 KS queries as in π 1 , but also makes KS queries on input all canonical intervals that contain v 0 and all canonical intervals that contain v 1 . Here, note that if v 0 (resp., v 1 ) is different from all attribute values, then exactly one of the canonical intervals containing v 0 (resp., v 1 ) was included in rD during initialization. Thus, only one of the KS queries associated with v 0 and only one of the KS queries associated with v 1 will be successfully completed, returning to C ranks for either an attribute value u i or a canonical interval containing the query range value. From now on, protocol Query 2 continues exactly as Query 1 . That is, C can use the obtained ranks to generate the m(qr) keyword queries to database pD, and obtain m(qr) matching records.
Properties of π 2 . The proofs that π 2 satisfies the correctness, privacy and efficiency properties defined in the 2-party model are obtained by extending the analogue proofs for π 1 , using the properties of the KS protocol π 0 . In particular, the correctness property of π 2 is showed by additionally using Facts 2 and 3. The privacy and the communication complexity properties are not significantly affected by the modifications in π 2 with respect to π 1 . The S-time complexity changes by observing that rD is larger in π 2 by a multiplicative factor of d.

Range Queries in the Three-Party Model
We show a 3-party RQ protocol by extending the 2-party protocol in Sect. 3.1. Our protocol follows the general structure outlined in Fig. 2 and satisfies the following Assuming the existence of a pseudo-random function, there exists (constructively) a 3-party privacy-preserving RQ protocol π 3 = (Init 3 , Query 3 ) for such a database, satisfying: 1. correctness 2. privacy against C (i.e., it only leaks the matching records to C); 3. privacy against S (i.e., it does not leak anything to S); 4. privacy against T P (i.e., it only leaks number of matching records, the repetition of query values and the repeated access to initialization encrypted data structures); 5. communication complexity of Query 3 on a queried range qr is O(m(qr)); 6. the T P -time complexity in Query 3 on a queried range qr is O(m(qr) log n).
Remark: on Exponential-Size Domains. We stated Theorem 3 for linearsize value domains, and established it by transforming the 2-party protocol π 1 into the 3-party model. By a very similar transformation, we can adapt the 2-party protocol π 2 into the 3-party model, and obtain a similar result for exponential-size value domains.

Our RQ Protocol in the 3-Party Model: An Informal Description.
Briefly speaking, our protocol π 3 is obtained by performing the following two main modifications in the 3-party model to protocol π 1 (which was designed in the 2-party model): (1) the KS protocol in the 2-party model is replaced by a KS protocol in the 3-party model [7], that was constructed starting from any pseudorandom function; and (2) the shifts performed to the entire databases rD 2 and pD 1 in protocol π 1 are now replaced by a 'lazy shifting' technique, according to which shifts are performed only to database entries which are used in the protocol. We note that the first modification replaces the use of asymmetric cryptography protocols with only symmetric cryptography techniques, and the second modification eliminates linear-time computations from S during the query subprotocol.
In fact, we can use the following simplified version of the KS protocol in the 3-party model from [7], by assuming that each keyword query will have at most 1 matching record (which was shown to be the case in π 1 ). First of all, S encrypts both the attribute column and the payload column in its database, where the attribute column is encrypted using deterministic encryption, via a pseudo-random permutation (which can be built from any pseudo-random function). As a result, the encrypted attribute column is searchable by T P using a conventional search data structure (i.e., a binary search tree). Later, S sends the encrypted database to T P and C sends its query values encrypted using the same pseudo-random permutation used by S (with key unknown to T P ). Finally, T P can search such value in the search data structure over the encrypted attribute values and return the matching record to C.
Given the above 3-party KS protocol, our 3-party RQ protocol π 3 works as follows. The following high-level structure of π 1 remains in π 3 : specifically, S constructs a rank database rD and a payload database pD, and C will perform keyword queries first based on rD and later based on pD. In π 3 , however, S sends encrypted versions of rD and pD to T P , and from then on, C only performs keyword queries to T P . Specifically, while the payload columns of rD and pD are encrypted using conventional probabilistic encryption, the attribute columns of rD and pD are encrypted using deterministic encryption, based on a pseudorandom permutation, which makes attribute column values searchable by T P . To encrypt an attribute value v ∈ Dom, S randomly chooses v 0 and an initial shift s such that v 0 + is = vmodn, and returns ciphertext (f k (v 0 ), is), where f is the pseudo-random permutation, and k is a key known to C and S but not to T P . An interesting property of such ciphertexts is that T P can compute a 'lazy shift' of v over its encryption and by any random next shift ns, by returning (f k (v 0 ), cs), where the current shift cs is = is+nsmodn. Such ciphertexts will be used by S to encrypt lower and upper ranks in rD before sending them to T P . Then, after C's keyword query to (the encrypted version of) database rD held by T P , such encrypted ranks will not be directly returned to C (or otherwise this may leak some information to C across multiple queries). Instead, T P and C will run a 2-party secure function evaluation (using Yao's protocol [25]), where C provides key k as input, T P provides the encrypted ranks and the current shift cs as input and the output returned to T P will be the encrypted queries for (the encrypted version of) database pD. Then, by using the current shift as input to the secure function evaluation protocol, T P obtains encrypted keyword queries, each of them being used to search across the first ciphertext component of all encrypted attribute values in pD, exactly as done in the above 3-party KS protocol.
Practical Performance of Our 3-Party Protocol. In our implementation, the S and T P processes and an instance of MySQL server version 5.5.28 were running on a Dell PowerEdge R710 server with two Intel Xeon X5650 2.66 Ghz processors, 48 GB of memory, 64-bit Ubuntu 12.04.1 operating system, and connected to a Dell PowerVault MD1200 disk array with 12 2 TB 7.2K RPM SAS RQives in RAID6 configuration. The C process was running on a Dell PowerEdge R810 server with two Intel Xeon E7-4870 2.40 GHz processors, 64 GB of memory, 64-bit Red Hat Enterprise Linux Server release 6.3 operating system, and connected to the Dell PowerEdge R710 server via switched Gigabit Ethernet.
The 3-party protocol that we implemented was somewhat different than the ones discussed in this paper, because it was developed under more complex and specific project requirements. However, by protocol analysis, we have noted that these differences are not expected to significantly affect practical performance of the protocols. Accordingly, we briefly report on the performance of our implemented protocols, as a useful indication on the performance of the protocols described here.
In our implementation, we have noted practical efficiency and scalability of our 3-party protocols, and were able to achieve query latency performance of no more than 1 order of magnitude slower than a comparable non-private protocol for the same task (specifically, a mySQL protocol for range queries over samesize value domains and database size). This result was achieved, with minor differences, over both linear-size and exponential-size value domains. A similar performance result was presented in [7] in the same implementation environment for keyword queries. In achieving such a result for range queries, our approach of constructing range query protocols from keyword query protocols was critical. This is especially the case when considering that the dominating performance factor in all our range query protocols is given by the performance of one keyword query for each of the records matching the range query. Performance numbers (where time is measured in milliseconds and communication in bytes) for range queries matching 1 % of the database records, are captured in Figs. 3 and 4.
The most challenging aspect in our performance analysis was the scalability of the initialization procedure, where we observed the following results: the initialization of the 3-party protocol for linear-size value domains, based on a transformation of the 2-party protocol π 1 , does achieve satisfactory scalability properties; however, the initalization phase of the 3-party protocol for exponential-size value domains, based on a transformation of the 2-party protocol π 2 , does not achieve satisfactory scalability properties, especially as the logarithm of the domain size grows. Although the initialization procedure is typically a one-time procedure, we still consider the following an interesting open problem: designing a 3-party privacy-preserving range query protocol that achieves scalable performance on both query latency and initialization.