Breaking and Fixing the Security Proof of Garbled Bloom Filters

. We identify a (cid:29)aw in the proof of security of Garbled Bloom Filters, a recent hash structure introduced by Dong et al. (ACM CCS 2013) that is used to design Private Set Intersection (PSI) protocols, a important family of protocols for secure cloud computing. We give counter-examples invalidating a claim that is central to the original proof and we show that variants of the GBF construction have the same issue in their security analysis. We then give a new proof of security that shows that Garbled Bloom Filters are secure nonetheless.


Introduction
Private Set Intersection (PSI) protocols is one of the most important family of protocol for secure computation that plays a central role in cloud computing (see Section 1 of [4]).Garbled Bloom Filters (GBF) are a recent hash structure introduced by Dong et al. in [4] (ACM CCS 2013) that are useful in the design of PSI protocols.The idea of GBF is to combine a Bloom Filter (BF) with XOR-based secret sharing to enable ecient test membership with regard to a set while hiding the presence of elements in this set that were not searched for.
The proof of Dong et al. can be summarized the following way: In a rst part they use a property of Bloom Filters to show that some event happens with negligible probability; then in a second part they assume the absence of the previously mentioned event and invoke the security of XOR-based secret sharing to conclude the proof.This invocation of the security of XOR-based secret sharing is done in a very immediate way, neglecting the fact that the functioning of GBF, however heavily inspired by the XOR-based secret sharing scheme, is not strictly speaking an instance of this scheme.The same remarks hold for a PSI protocol suggested by Pinkas et al. in [11], based on the original GBF construction by Dong et al., for which the proof of security is very short and follows a reasoning similar to the one of Dong et al.In this paper we show that a simple invocation of the security of XOR-based secret sharing is in fact not sucient to show that GBFs are secure.We do so by providing a counter-example, and further a larger class of counter-examples, that invalidate the claims made in both previously mentioned proofs.
We show however that GBFs do satisfy their claimed security properties by providing a new, more rigorous proof.

Organization of the Paper
In Section 2 we describe Bloom Filters, Garbled Bloom Filters as introduced by Dong et al. in [4], and the original proof of Dong

Notations
We make use of the following usual conventions: With X a set, we denote by x $ ← − X the fact that x is sampled uniformly from X.With i a positive number, we denote by [i] the sequence (1, 2, . . ., i).A function µ(•) is negligible if for every positive polynomial p(•) and all suciently large n, it holds that µ(n) ≤ 1/p(n).Throughout the paper, λ denotes the security parameter and two probability ensembles X = {X λ } λ∈N and Y = {Y λ } λ∈N are said to be computationally indistinguishable [6,Denition 7.30] denoted X c ≡ Y, if for every probabilistic polynomial-time distinguisher D there exists a negligible function µ such that:

Bloom Filters
Bloom Filters (BFs), introduced by Bloom in [1] and further studied by Broder and Mitzenmacher in [2], are a hash structure that aims at eciently testing membership in a set.A BF is an array B of M bits associated with k random hash functions h 1 , . . ., h k : {0, 1} * → [M ] .B is initialized by setting all the array values to zero and one inserts an element x ∈ S in B by setting B[h i (x)] to 1 for all i.Finally one checks the presences of x in the set S encoded by B by testing whether B[h i (x)] is equal to 1 for all i ∈ [k]; if it is not the case then x cannot be in S, otherwise x is in S with high probability.Following [12] we use the notation h * to denote the set of all positions corresponding to an element x or a set S: We will denote by BF S the Bloom Filter encoding set S when there is no ambiguity about what parameters M and (h i ) i∈[k] where used.
The event that x appears to be encoded in B while it is actually not in S is called a false positive.Dong et al. [4] show that the probability for x / ∈ S to cause a false positive is negligible in the number of hash functions k.As a consequence, setting the number of hash function as greater or equal to the security parameter Broder and Mitzenmacher in [2] show that the optimal value for k, that minimizes the false positive probability for a given M and set size N , is: They also show that with this value of k about half of the bits are set after insertion of all the elements in S.

Garbled Bloom Filters
Garbled Bloom Filters (GBFs) were introduced by Dong et al. in [4] (ACM CCS 2013).GBF is a variant of BF that has some security properties making it suitable for the design of Private Set Intersection (PSI) protocols (see [11] for a description of PSI and a review of most recent schemes, including the one of [4]).
Like a BF, a GBF is an array of length M associated with k random hash functions h 1 , . . ., h k : {0, 1} * → [M ] .However the components of a GBF are not bits but bit strings of length λ.One inserts x in a GBF B by ensuring that i B[h i (x)] = x, and checks the presence of x in B by testing the same equality.
During insertion, each share is picked uniformly at random as in a XOR secret sharing scheme, except the shares that were already set by the insertion of a previous element that are left unchanged.Components that were never wrote during insertion of the whole set are lled with random values.Algorithm 1 gives a more formal description of how a GBF is built.As with normal Bloom Filters, we will denote by GBF S a GBF encoding set S when there is no ambiguity as to the parameters used.
The security property of GBFs, which will be given formally in Section 2.5, can be informally described as follows: Denition 1 (Security of GBF informal).Let S and C be two sets; Given only {GBF S [i] ∀i ∈ h * (C)} one cannot get any information about S − C.

Private Set Intersection Based on GBF
We give a quick overview of how GBFs are used in the design of Private Set Intersection (PSI) protocols with one-sided output.Informally, a PSI protocol Algorithm 1: An algorithm for building a GBF representing set S with parameters M, (h i ) i=1...k , λ Algorithm: GBF.Build Fill any remaining empty component with fresh random values ; is a protocol between two parties, each having a set, who want to compute the intersection of their respective sets without revealing more information than this intersection.In the one-sided output setting only one party, called the receiver, learns the intersection, while the other party, called the sender, learns nothing.
In the PSI protocol of Dong et al. [4], the sender holds a set S and computes GBF S while the receiver holds a set C and computes BF C .Both parties use the same (G)BF parameters.
The two parties then run an Oblivious Transfer (OT) protocol, that is a protocol that allows a party (called receiver and that matches the receiver of the PSI protocol) to retrieve a record in a database held by another party (the sender, who again matches the sender of the PSI protocol) without revealing to the sender which record was retrieved by the receiver and without revealing to the receiver the other records in the database.
The OT protocol is used in the PSI protocol of [4] by the receiver in order to retrieve the components of GBF S corresponding to the ones in BF C .For any element in S ∩ C, its corresponding components were retrieved so the receiver is able to assert its presence in S ∩ C. At the same time, the security property of GBFs guarantee that the receiver got no information about any element of S − C. As for the sender, the privacy properties of the OT protocol suce to prevent him from learning anything about the set C of the receiver.

Original Proof of Security by Dong et al. [4]
The security of GBF is expressed by Theorem 4 in [4] which we reformulate in an equivalent way in Theorem 1 of this paper.This theorem requires the denition of the intersection between a GBF and a BF sharing the same parameters (see Section 4.2 of [4]) Denition 2 (Intersection between a GBF and a BF).Let M, (h i ) i=1...k and λ be some GBF parameters.Let S and C be two sets, and let GBF S and BF C be built with parameters M, (h i ) i=1...k (and λ for the GBF).The intersection of GBF S and BF C , noted GBF S ∩ BF C , is dened as:  (Proof of Theorem 1 as it appears in [4]) Given GBF S ∩ BF C , we modify it to get (GBF S ∩ BF C ) ∩ BF S∩C .We scan GBF S ∩ BF C from the beginning to the end and for each location i, we modify (GBF S ∩ BF C )[i] using the following procedure: fall into one of these three cases, so there is no unhandled case.Now we argue that the distribution of (GBF S ∩BF C )∩BF S∩C is identical to GBF S∩C .To see that, let's compare each location in (GBF S ∩BF C )∩ BF S∩C and GBF S∩C .From Algorithm 1 and the above procedure, we can see that (GBF S ∩BF C )∩BF S∩C and GBF S∩C contain only shares of elements in S ∩ C and random strings.Because (GBF S ∩ BF C ) ∩ BF S∩C and GBF S∩C use the same set of hash functions, for each 0 3 Invalidation of the Proof in [4] The end of the proof contains the following assertion: (GBF S ∩BF C ) ∩ BF S∩C ≡ GBF S∩C always holds and GBF S ∩ BF C ≡ (GBF S ∩ BF C ) ∩ BF S∩C holds in case 2 .This should result in GBF S∩C ≡ GBF S ∩ BF C in case 2. We invalidate this claim by giving a counter-example.Let the number of hash functions be k = 3; let x and y be two elements of S − C such that h 1 (x) = h 1 (y) and that for all i = 1, h i (x) ∈ h * (C) and h i (y) ∈ h * (C).This example is illustrated in Figure 1.Note that this example can be situated in the case 2 of the proof of [4] as it does not require any element of S to have all its positions in h * (C).
x y We have that GBF S must satisfy the following equations where we note GBF S [h i (x)] as x i (and similarly with y): Combining ( 2), ( 3) and (4) gives: If we re-write the latter equation without our short-hand notation, we have that GBF S satises the following: Regarding GBF S ∩ BF C , it does not satisfy equations ( 2) and ( 3) anymore because the component GBF S [h 1 (x)] was replaced by a fresh random value during the intersection operation; but it still satises equation ( 5) as it only involves components that were not re-randomized during intersection, thanks to the fact that h 2 (x), h 3 (x), h 2 (y) and h 3 (y) are in h * (C).
On the other hand GBF S∩C , which was built without the knowledge of x and y, does not satisfy (5) (except with a very small probability).As a result a GBF where relation (5) does not hold is a valid outcome for the distribution of GBF S∩C but not for the distribution of GBF S ∩BF C .Those distributions cannot be identical, and the proof given in [4] of Theorem 1 is wrong.The same counterexample can also be used to invalidate the claims that GBF C∩S ≡ GBF C∩S and that GBF C∩S ≡ GBF C∩S .This is not just a typo in [4], but truly a aw in the proof.Recall, the proof uses the fact that any x ∈ S − C has, with overwhelming probability, one of its positions, say h 1 (x), out of h * (C).As a result this component is overwritten during intersection (or never retrieved in a PSI scenario).Dong et al.
then invoke the security of the XOR-based secret sharing scheme to argue that no information can be obtained about x 1 ⊕ x 2 ⊕ x 3 .But the GBF construction is not the exact same thing as a XOR secret sharing scheme, and the argument does not hold.More precisely, in a GBF the component GBF S [h 1 (x)] (or x 1 ) may not be independent from other components in the GBF and in particular its value can be tied to the value of other components that may be in h * (C) and are thus visible, which is the case with components y 2 and y 3 in our example.

Generalization of the Counter-Example
We give a larger class of situations where the same claims prove wrong.Let P (S, C) (or just P if there is no ambiguity about the inputs) be the set of positions that appear an odd number of times in (h * (x) ∀x ∈ S − C): Then GBF S satises the following relation, of which ( 5) is a special case, and which is obtained the same way as (5) was obtained: If moreover P ⊂ h * (C), none of the concerned components are re-randomized during intersection so GBF S ∩ BF C satises the same relation, that is:   x OT, but it also has a more essential dierence with the construction of Dong et al. in that the sum of the components associated to an element need not be equal to the element itself.Instead, all component values are all chosen uniformly at random and the sender sends for each element in her set a summary value that is the sum of the components corresponding to this element, that is: The receiver retrieves the components corresponding to her own elements via OT and compute similar sums for these elements.Finally, the receiver compares the sums she computed with the sums she received to learn which elements are in both sets.
did not nd in the paper of Rindal and Rosulek the issue we identied in [4] and [11].
Note however that the construction of Rindal and Rosulek and of Pinkas et 5 New Proof of Security

New Case Distinction
Our proof follows the idea of the proof of Dong et al. [4]: we consider two cases, one that occurs with negligible probability and one in which the two distributions are actually identical, and this results in the two distributions being indistinguishable.What diers between our proof and the one of [4] is the case separation: as we saw, the assumption of case 2 of [4] that no element in S − C has all its positions in h * (C) does not suce to have GBF S ∩ BF C ≡ GBF S∩C .
Instead, we make the following remark: it is very unlikely that there is some subset X of S − C such that all the positions in h * (X) being mapped by a single element in X happens to be in h * (C).Said dierently, for any subset X ⊂ S − C there is at least one position in h * (X) that is both out of h * (C) and corresponds to a single element of X.Note that this covers the situation described in Section We now show that the algorithm does not halt.Recall, the building algorithm halts when an element that must be inserted only maps to positions that are not empty.Since we are in the case where no X ⊂ S − C satises m(X) ⊂ h * (C), there must be a position in h * (S − C) that is not in h * (C) and which is mapped by a single element y ∈ S − C. As a result if (S − C) − {y} was inserted without halting, then the nal y can be inserted without halting as well.This reasoning can be repeated to show that (S − C) − {y} can be inserted without halting as well, and recursively S − C can be inserted entirely without halting.
Finally given an outcome B of Extract(BF C , GBF S ) one can trivially build a valid GBF B encoding S ∩ C such that Extract(BF C , B ) = B: it suces to ll all empty components of B with random values.As a result we have Extract(BF C , GBF S ) ≡ Extract(BF C , GBF S∩C ) in our second case, and this ends the proof of Theorem 1.
Note that this proof would also apply to the construction of Pinkas et al. [11].

Related Work
Security issues in the paper of Dong et al. [4] where identied by Rindal and Rosulek [12] and by Lambaek [8], but none of these issues apply on the protocol that we study in this paper.Indeed, [4] describes two protocol: one that aims at providing security against honest-but-curious adversaries, which is the one that is being studied in this paper, and one that aims at providing security against malicious adversaries.The issues identied in [12] and [8] only concern the malicious-security protocol, and do not apply to the honest-butcurious-security protocol (both present the honest-but-curious-security protocol as satisfying its claimed properties).
By contrast, the issues we identify concern the security of the GBF construction.This property is invoked in the security proofs for both the honest-butcurious-security protocol and the malicious-security one, so the two protocols are aected.The issue we identify is thus dierent, and more general, than the ones identied in [12] and [8].

Conclusion
Garbled Bloom Filters are a hash structure which, however still recent, already had a signicant impact on the design of secure protocols.We showed that the security analysis of Garbled Bloom Filter contains a subtle diculty as the intuition that GBF security derives almost immediately from the security of XOR-based secret sharing is actually false.Nevertheless we show that all existing GBF constructions actually satisfy their claimed security property by providing a new, more rigorous proof.This should strengthen the condence we can have in the GBF construction and promote a large use of it in the domain of secure protocol design.
et al. for the security of Garbled Bloom Filters.In Section 3 we give a counter example (and a class of counter-examples) that invalidates the proof of Dong et al.In Section 4 we describe the impact of our results on other GBF constructions that were inspired by the one of Dong et al.In Section 5 we give a new proof of security for the GBF construction of Dong et al.Finally in Section 6 we compare this work with related work.

(
which is what Dong et al. do) results in a false positive probability negligible in the security parameter.
empty otherwiseWe now give Theorem 4 of[4] in a slightly reformulated but equivalent form:Theorem 1 (Security of GBF (Theorem 4 of[4])).Let λ and N ∈ N and let k = λ and M = N k/ ln(2); let (h i ) i∈[k] be a sequence of random oracles {0, 1} * → [M ]. we have (S, C, GBF S ∩ BF C ) c ≡ (S, C, GBF S∩C ∩ BF C ) Where S and C have at most N elements.Equivalently with our extraction notation:(S, C, Extract(BF C , GBF S )) c ≡ (S, C, Extract(BF C , GBF S∩C ))The proof Dong et al. give for Theorem 1 is reproduced below, with only minor modications to make it match our notation.Namely, what is written GBF C∩S , GBF C∩S and GBF C∩S in the original text is written respectively GBF S ∩ BF C , GBF S∩C and (GBF S ∩ BF C ) ∩ BF S∩C in ours.We give a quick overview of their proof: In their case 1, they show that the probability that some element of S − C has all its positions in h * (C) is negligible; then in case 2 they argue that if no element of S −C has all its elements in h * (C), the distribution of GBF S ∩ BF C is then identical to the one of GBF S∩C .They invoke the security of the XOR-based secret sharing scheme to argue that an element of S − C of which one of the shares was re-randomized during intersection cannot leave any trace in the resulting GBF (this is the argument we will go against in Section 3).
a random string.The distribution of a share depends only on the element and the random strings are uniformly distributed.So the distribution of every location in (GBF S ∩ BF C ) ∩ BF S∩C and GBF S∩C are identical therefore the distributions of (GBF S ∩ BF C ) ∩ BF S∩C and GBF S∩C are identical.Then we argue that the distribution of (GBF S ∩ BF C ) ∩ BF S∩C is identical to GBF S ∩ BF C except for a negligible probability η.Case 1, GBF S ∩ BF C encodes at least one elements in S − C ∩ S. In this case the distribution of (GBF S ∩ BF C ) ∩ BF S∩C diers from the distribution of GBF S ∩ BF C .From Theorem 3, the probability of each element in S − C ∩ S being encoded in GBF S ∩ BF C is .Since there are d = |S| − |C ∩ S| elements in S − C ∩ S, the probability of at least one element is falsely contained in GBF S ∩ BF C is: η = [skipped...] ≤ 2dAs we can see η is negligible if is negligible.

Case 2 :
GBF S ∩ BF C encodes only elements from C ∩ S. In this case, each element of S − C ∩ S may leave up to k − 1 shares in GBF S ∩ BF C .The only dierence between GBF S ∩ BF C and (GBF S ∩ BF C ) ∩ BF S∩C is that in (GBF S ∩ BF C ) ∩ BF S∩C , all residues shares of elements in S − C ∩ S are replaced by random strings.From the security of the XOR-based secret sharing scheme, the residue shares should be uniformly random (otherwise they leak information about the elements).Thus the procedure does not change the distribution when modifying GBF S ∩BF C into (GBF S ∩ BF C ) ∩ BF S∩C .So the distributions of GBF S ∩ BF C and (GBF S ∩ BF C ) ∩ BF S∩C are identical.The probability of this case is at least 1 − η.Since (GBF S ∩ BF C ) ∩ BF S∩C ≡ GBF S∩C always holds and GBF S ∩ BF C ≡ (GBF S ∩BF C )∩BF S∩C in case 2, we can conclude that P r[GBF S ∩ BF

Fig. 2 :
Fig. 2: An example of a more general counter-example involving 3 elements of S − C.

Figure 2
Figure2illustrates such a more general case with 4 hash functions and involving 3 elements x, y and z where GBF S ∩ BF C would satisfy the following relation (but GBF S∩C would not): Other GBF Constructions We describe the consequences of our ndings on the other GBF constructions that were inspired by the one of Dong et al., namely the ones of Pinkas et al. [11, Section 4.3] (USENIX Security 2014), and Rindal and Rosulek [12] (EU-ROCRYPT 2017).4.1 Pinkas et al. [11]: Same Situation as Dong et al. [4] The construction of Pinkas et al. presents many optimizations over the one of Dong et al., for instance through the use of random OT instead of classical al. cannot always be used as a drop-in replacement of the original construction of Dong et al.One example is a Searchable Encryption protocol [13] that uses Garbled Bloom Filters but where the receiver looks up several GBFs and must be unable to know what response (in the form of components retrieved) comes from what lter.This requires that the receiver must be able to decide on the result of a lookup (present or absent) using only the components retrieved and without remembering what was the component that was being looked for.The authors modify the GBF construction of Dong et al. by having the components corresponding to an element having their sum equal to a xed value instead of the value of the element itself: i∈h * (C) GBF S [i] = 0 Such a property could not be reached in a trivial way using the construction of Rindal and Rosulek (or even the one of Pinkas et al.) because the sending of summary values by the sender requires that the receiver knows what to compare these values with, which requires that the receiver knows what GBF the values correspond to.This shows why the study of the security proof of constructions other than the one of Rindal and Rosulek is still relevant.
1 a random value otherwise Dong et al show that GBF S ∩ BF C is a correct GBF encoding S ∩ C. We also dene the notion of extraction of a GBF with a BF, which is equivalent to the notion of intersection but will make our proof in Section 5 simpler.We will use the notion of intersection mostly in Section 3 in order to stay as close as possible to the notation of Dong et al., and in Section 5 we will mostly use the notion of extraction.With extraction, non-selected components are simply dropped, or equivalently set to a special empty value, instead of being replaced by a random value.It should be obvious that one obtains as much information from a uniform independent random value than from a xed value.Denition 3 (Extraction of a GBF with a BF).Let M, (h i ) i=1...k and λ be some GBF parameters.Let S and C be two sets, and let GBF S and BF C be built with parameters M, (h i ) i=1...k (and λ for the GBF).The extraction of GBF S using BF C , noted Extract(BF C , GBF S ), is dened as: [4] (an thus our counter-examples in Figures1 and 2 too): if all the position in h * (S − C) mapped an odd number of times are in h * (C), then all the positions out of h * (C) are mapped at least 2 times.Formally we dene the mapped-once positions of X, noted m(X), and the never-mapped positions of X, noted n(X),We consider two cases as it is done by Dong et al.[4]:The rst case is where there is a X ⊂ S − C such that m(X) ⊂ h * (C).From Theorem 2, This case happens with negligible probability.The second case is thus where there is no such X, and we show that in this case the distributions are identical by showing that any outcome of one distribution is a valid outcome of the other.Let B be an outcome of the right-hand distribution, that is, the one with GBF S∩C ; we show how to build a GBF B that is a valid outcome of GBF S such that Extract(BF C , B ) = B. We build B the following way: We start from B which, recall, is a Garbled Bloom Filter with all its components not in domain of C being empty.We will insert each element of S − C in B, keeping components that were already set untouched.Insertion happens just as in the GBF.Build algorithm.When all elements have been inserted, the remaining components are lled with random values, just as in the end of GBF.Build.If the algorithm did not halt, the resulting B encodes every element of S ∩ C (from the initial values from B) and every element of S − C (that we just inserted).As a result, B is a valid GBF S and Extract(BF C , B ) is a valid outcome for Extract(BF C , GBF S ).