SDKSE-KGA: A Secure Dynamic Keyword Searchable Encryption Scheme Against Keyword Guessing Attacks

. A number of searchable encryption schemes have been widely proposed to solve the search problem in ciphertext domain. However, most existing searchable encryption schemes are vulnerable to key-word guessing attacks. During keyword guessing attacks, with the help of the cloud, an adversary will learn what keyword a given trapdoor is searching for, which leads to the disclosure of users’ privacy information. To address this issue, we propose SDKSE-KGA: a secure dynamic keyword searchable encryption scheme which resists keyword guessing attacks. SDKSE-KGA has constant-size indexes and trapdoors and supports functionalities such as dynamic updating of keywords and ﬁles. Formal proofs show that it is Trapdoor-IND-CKA and Index-IND-CKA secure in the standard model.


Introduction
Searchable encryption is an effective way to solve the search problem in ciphertext domain.It not only protects users' privacy but also completes search task.During the process of searchable encryption, users need encrypt data before uploading it.Then, they use trapdoors of keywords to execute search task.So the cloud cannot get exact information about data and keywords.
Song et al. firstly proposed a searchable encryption scheme for mail system in [1].Then, the concept of searchable encryption (SE) came into people's attention and aroused a series of researches [2][3][4][5][6].According to the encryption methods, divide SE into searchable encryption scheme (SSE) and public-key encryption with keyword search (PEKS).SSE has the advantages of high efficiency and practicability.So people tend to research the functionality of SSE.[7,8] realized the function of multi-keyword search.[9,10] realized the function of fuzzy search.[11] realized the ranking function of search results.PEKS has the advantage of strong security.So people tend to improve search expressions and security.[12,13] implement access control for users search privileges.[14] has the traceability for malicious users.[15] implements the revocation of malicious users privileges.[16] implements the verification of search results.
Consider one dynamic mail system: For user Alice, she has many friends and business partners in real life.So her inbox may received all kinds of mails everyday.The inbox will store these mails into cloud servers.Considering the cloud is not fully trusted, all information should be encrypted.When Alice checks mails, she will filter mails generally and search for parts of them.The search keywords are determined by Alice herself, and she is likely to change keywords according to the actual life.This application scenario requires our searchable encryption scheme to support dynamic keywords.
Kamara et al. firstly proposed a dynamic searchable symmetric encryption scheme in [17].They gave the definition of dynamic CKA2 security and constructed algorithm by reverse indexes.But this scheme has the disadvantage of information leakage.They offered an improved scheme in [18].It uses red black tree as index tree to protect information.But this advantage is at the cost of reducing search efficiency.Hahn et al. presented a new scheme in [19].It leaks nothing except the access pattern and requires the scenario to have huge data and a few keywords.Xia et al proposed a multi-keyword ranked search scheme in [20], which supports dynamic environment, too.It uses balanced binary tree as index tree and sorts search results for users.But it lacks trapdoor indistinguishable security.Later, they presented a new scheme in [21].For mail systems, it has significant reduction in IO cost.But the size of index is large.It will make pressure on communication overhead.
Meanwhile, Byun et al. first introduced the concept of keyword guessing attack in [25].In keyword guessing attacks, adverseries take advantage of the fact that the keywords that one user are likely to use commonly are very limited.So they make guesses of the keyword corresponding to a trapdoor.With the help of the cloud, they are able to verify whether the guess is correct and shortly they will know which keyword this trapdoor is searching for.It is a crucial attack and violates the goal of searchable encryption.
The concept of offline keyword guessing attacks was proposed in [25].Then Yau et al. presented the concept of online keyword guessing attacks in [26].Tang et al. proposed a public-key encryption supporting registered keyword search in [27].But it requires the sender and the receiver to negotiate registered keywords before the system was established.Compare with it, our scheme relaxes the restrictions on communication between senders and receivers.Chen et al. proposed a searchable encryption under dual systems in [28].There are multiple interactions between front server and back server.It prevents independent servers from getting complete information to withstand attacks.Compare with it, the cloud server in our scheme has less computational and communication pressure.
For the above mail system, we construct a dynamic searchable encryption scheme which resists keyword guessing attacks.Our contribution is summarized as follows: 1. Our SDKSE − KGA supports dynamic management of both keywords and files.In the mail system, senders may send messages anytime and receivers may delete messages too.The receiver may add or delete keywords by the binary tree.Compared with other papers, the cost of updating keywords and files is negligible.In addition, the update operation is completely executed by the receiver, so there is no risk of leaking private data.2. Our SDKSE − KGA has Index-IND-CKA(Index Indistinguishable against Chosen Keyword Attack) security and Trapdoor-IND-CKA(Trapdoor Indistinguishable against Chosen Keyword Attack) security.We will demonstrate security under the standard model.Moreover, the indexes and trapdoors are of constant size which helps to reduce transmission overhead significantly.3. Our SDKSE−KGA resists keyword guessing attacks.Therefore, our scheme has higher security level compared with other searchable encryption scenarios.In this scheme, the search task is assigned to the cloud server and the receiver, The cloud server performs fuzzy search while the receiver accurate search, The cloud server is not able to obtain specific information of keywords, so it cannot launch the keyword guessing attacks.
The rest of our paper is organized as follows.In Section 2, we will introduce the system model and security model, and describe some symbols used in our construction.In Section 3, we will introduce the keyword tree and fuzzy mapping function in detail.Section 4 depicts SDKSE − KGA scheme in detail.Section 5 and Section 6 will show security analysis and performance analysis of SDKSE − KGA.In the last section we will summarize this paper.

System Model
There are three roles in our application scenario: mail senders, mail receivers and the cloud server.The sender is responsible for adding keywords to these files, encrypting files and generating exact indexes and fuzzy indexes for keywords, and uploading them to the cloud server.The receiver is responsible for managing all the keywords by constructing a binary tree, and generating fuzzy and exact trapdoors.After receiving a fuzzy trapdoor, the cloud server conducts fuzzy search upon fuzzy indexes and sends fuzzy results to the receiver.Then the receiver performs exact search on the fuzzy results based on the exact trapdoors to obtain final results.Considering third-party cloud servers cannot be fully trusted, we hope the cloud server get as little information as possible.Moreover, with the help of the cloud, KGA will learn what keyword a given trapdoor is searching for, which leads to the disclosure of users privacy information.In our model, the cloud server is only allowed to perform fuzzy search.Even if it has access to all the fuzzy indexes of keywords and some of legal fuzzy trapdoors, it is still unable to get the exact information of the search.Moreover, this model not only protects the security of keywords, but also resists keyword guessing attacks.Query Phase 2. Repeat Query Phase 1.The adversary A continue to issue keywords except the target keywords w * 0 and w * 1 .Guess.The adversary gives β as the guess of β, if β = β, then the adversary wins.The advantage of A in this game is defined as follows: We say that SDKSE-KGA is Index-Indistinguishable security if Adv A is negligible for any polynomial time attacker A. The advantage of B in this game is defined as follows: Definition 2. We say that SDKSE-KGA is Trapdoor-Indistinguishable security if Adv B is negligible for any polynomial time attacker B. The advantage of C in this game is defined as follows: We say that SDKSE-KGA is Adaptive KGA security if Adv C is negligible for any polynomial time attacker C.

Notations
This part we will illustrate some symbols used in this scheme.To manage all the keywords, we build a binary tree denoted by T , use L to indicate the height of T .And the height L is related to N which means the number of keywords.The fuzzy keyword mapped by the keyword w is expressed as w f .[I 1 , .. -IndexDelete(w) : Notify related files to update keyword list and delete existing index of w.

Keyword Tree
The receiver is responsible for constructing the binary tree T .The tree T has two tasks: managing keywords dynamically and running fuzzy mapping function.
Construct tree T based on the number of keywords N , height L = log 2 N + 2.
Each leaf node may bind to one keyword.We call one leaf node that have not yet bound keyword as available node.The number of available nodes is denoted by avlS.In order to ensure the growth of the tree, we require avlS ≥ minS, where minS = 2 It is very easy to delete one keyword.We just need to set the state of the leaf node bound to this keyword to 0, and if this key is used again later, just change the state of the leaf node to 1. Adding keyword can be divided in two situations: If avlS > minS, select an appropriate available leaf node and bind it to the keyword.Then set its state value to 2. If avlS = minS, then generate child nodes of all available leaf nodes to double the number of available leaf nodes.The growth process is shown in  Now we design fuzzy mapping function to map each keyword to a fuzzy keyword.The position of the fuzzy keyword in the tree will be used to generate the pair of fuzzy index and fuzzy trapdoor.The cloud server searches upon the fuzzy index-trapdoor pair while the receiver searches upon the exact indextrapdoor pair.Now we introduce the fuzzy mapping function.For one leaf node in the binary tree, trace it up to n levels where n is a parameter defined by users, the obtained node is the corresponding fuzzy node of it.If two leaf nodes have the same ancestor node after tracing the same layers, then these nodes share a fuzzy node.

Bilinear Map
In our scheme, we apply bilinear map to F uzzySearch and ExactSearch algorithm.The specific principle is as follows: There is a composite group G with order n = p 1 p 2 p 3 p 4 where p 1 , p 2 , p 3 and p 4 are distinct primes.Assume one of the generators of G is G, then the generators of G p1 , G p2 , G p3 and G p4 are G 1 , G 2 , G 3 and G 4 respectively.And . We infer that for distinct i and j, ∀R i ∈ G pi , R j ∈ G pj , e(R i , R j ) = 1 holds.

Complexity Assumptions
The security of our scheme is based on six complexity assumptions [22].The hardness of these assumptions relies on the theorems proposed by [24].
In Assumption 1, given a group generator G, input security parameter λ, then generate primes p 1 , p 2 , p 3 , p 4 , two groups G, G T , and the bilinear map e. Set the integer n = p 1 p 2 p 3 p 4 .Select random element x from G p1 , similarly select The following assumptions are very similar to Assumption1, so we only introduce their differences.In Assumption 2, For Assumption 1 ∼ 6, we have the following definition: Definition 4 : For any polynomial time, if Adv − N G,B (λ) is a negligible function of λ, then we think the group generator G satisfies Assumption N , N ∈ {1, 2, 3, 4, 5, 6}.

Construction
In this section we will introduce SDKSE-KGA in detail.
Setup(λ, N ) : First, the receiver builds the keyword tree T to manage initial keywords.For keyword w, encode it as [I 1 , . . ., I h ] according to its position in the binary tree.Note h = L − 1. Next, runs group generator G and obtains (p 1 , p 2 , p 3 , p 4 , G, G T , e).Then, selects random elements x, y, u 1 , ..., the generator of G p3 and G 4 is the generator of G p4 respectively.So a random element of G p4 can be chosen by raising G 4 to random exponents from The master private key M SK = [x, y, u 1 , . . ., u h , ω].The receiver publishes the params and retains the M SK for generate trapdoors later.
Encrypt(params, w) : w represents the keyword to be encrypted, parse it to [I 1 , ..., I h ].The sender selects random integer s ← Z n and random elements R 4 , R 4 ← G p4 .Picks random message M .Next, set Then, according to the fuzzy mapping function, the keyword w is mapped to w f , parse it to [I 1 , ..., I h f ].The sender selects random integer TDGen(M SK, w) : w is the keyword to be retrieved.Parse w to [I 1 , ..., I h ].The receiver selects random integers r 1 , r 2 ← Z n and random elements To obtain the exact trapdoor ET d of the keyword w.
Map the keyword w to w f , parse it to [I 1 , ..., I h f ].The receiver selects random integers r f,1 , r f,2 ← Z n and random elements To obtain the fuzzy trapdoor F T d of the keyword w.
add all files containing exact keywords which mapping to w f into the fuzzy result F uzzyCipher.Then F uzzyCipher will be sent to the receiver.
If M = M , then output the file set C which contains the keyword w.KWInsert(w) : Select an appropriate leaf node to bind the new keyword w in the binary tree.
IndexInsert(w) : Generate the index based on the location in the tree and add it into index list.
KWDelete(w) : Disable the keyword w in the binary tree.IndexDelete(w) : Delete the index of w from the index list.

Security Proof
In this section, we will prove the security of SDKSE − KGA.Each keyword owns an exact trapdoor-index pair and a fuzzy trapdoor-index pair.The sender generates fuzzy indexes and exact indexes and sends them to the cloud.The receiver generates fuzzy trapdoors and exact trapdoors and sends fuzzy trapdoors to the cloud.Notice that in both F uzzySearch and ExactSearch algorithms, only if the location strings corresponding to the trapdoor and the index are identical, the match operation will succeed.Since the fuzzy trapdoors and fuzzy indexes are generated upon the position, which one-to-one mapped into the location string of the fuzzy node, the match operation will only succeed when the fuzzy trapdoor and fuzzy index are generated upon the same fuzzy node.On the other hand, fuzzy nodes and exact nodes are different from each other, so the match operation upon a fuzzy trapdoor and an exact index will always generates ⊥.Therefore, even the cloud gets exact indexes, the privacy of users will not be destroyed.Now we will prove our [22].
Proof.We will give the definitions of semi-functional indexes and semi-functional trapdoors for ExactIndex and ET d , and show a series of games.Semi-functional indexes are composed by CT 0 ,CT 1 ,CT 2 .
where CT 0 , CT 1 and CT 2 are components of CT generated in Encrypt algorithm.And x 2 ∈ G p2 , r, z c R ← Z N .Semi-functional trapdoors are as follows: , where T d ← Z N .In addition, we need to construct a series of games.Game Real : Game 1. Game Restricted : It is similar to Game Real except that the adversary cannot query keywords which are prefixes of the challenge keyword modulus p 2 .Game k :0 ≤ k ≤ q, and q is the number of queries made by the adversary.The difference between Game k and Game Restricted are query results.The challenge index is semi-functional index in two games and the first k results of trapdoor are semi-functional trapdoors in Game k .Game Mhiding : It selects random elements from G and constructs CT 0 of the challenge index.Game Random : The second component and the third component of challenge indexes are independent random elements in G p1p2p4 in this game.
In Game Random , the adversary knows nothing about keyword from the challenge index.So we need prove Game Real and Game Random are distinguishable.First step, the adversary selects keywords w 0 and w 1 , w 0 = w 1 mod n and w 0 ≡ w 1 mod p 2 .The simulator ∫ factor n by computing gcd(w 0 − w 1 , N ).But the assumption 1,2,3 will prove that n cannot be decomposed.As a result, Game Real and Game Restricted are distinguishable.Second step, we will prove Game Restricted and Game k are distinguishable.According to assumption 1, construct a new game.In this game, if T = T 0 , the index generated by challenger is semi-functional index.In this case, the game is equal to Game 0 eventually.If T = T 1 , the index generated by challenger is normal index and the game is equal to Game Restricted .T 0 and T 1 have the same distribution in statics, so Game Restricted and Game k are distinguishable.Third step, we will prove the series games Game k (0 ≤ k ≤ q) are distinguishable.Use the same way to construct a new game according to assumption 5.The trapdoors sent by challenger are semi-functional trapdoors.If T = T 0 , the game is equal to Game q .If T = T 1 , the game is equal to Game M hiding .So Game q and Game M hiding are indistinguishable.Continue to deduce, we will get the conclusion that Game M hiding and Game Random are indistinguishable by constructing the new game according to assumption 6.Finally, Game Real and Game Random are distinguishable.The proof is completed.
x, y, w, u 1 , . . ., u h belong to public parameters, R i .m 0 and m 1 are the elements in G p1 .In statistics, the distributions of m r 0 and m r 1 are exactly the same where r is a random element in Z n .So the adversary is not able to guess the value of β by m 0 , m 1 .In other words, the adversary should not be able to distinguish the trapdoors of w * 0 and w * 1 .The proof is completed.
Proof.Case 1: If two challenge keywords will map to different fuzzy keywords, they will generate different fuzzy trapdoors.So the KGA security game is exactly the same as Trapdoor-IND security game.In this case, the advantage of adversary winning the game is negligible.Case 2: If two challenge keywords will map to the same fuzzy keywords, they will generate the same fuzzy trapdoors.The challenge keywords w * 0 and w * 1 have the same distribution in statistics.The adversary cannot determine β based on the fuzzy trapdoor.In other words, he cannot distinguish between w * 0 and w * 1 .In both cases, the advantage of the adversary winning the game is negligible.

Performance
This section mainly gives the performance analysis of SDKSE-KGA.The Setup algorithm requires h + 2 multiplications and one pairing, it takes 2(h + 3) multiplications and 6 modular exponentiations to generate one exact trapdoor where h denotes the height of keyword tree in the scheme.It takes h + 2 multiplications and 3 modular exponentiations to generate one index.For Search algorithm, it requires 2 pairings and 2 multiplications.The computational overhead of KW Insert and KW Delete are negligible.Our SDKSE − KGA scheme supports keyword and file updating at the same time.To add a document, [20] and [17] need to iterate through keyword arrays and [18] needs to traverse a KRB tree.So the updating cost is very high.In addition, the index and trapdoors of our scheme are of constant size which reduces transmission overhead significantly.Table 1 shows the efficiency comparison between [20], [17], [18]  Compared with other searchable encryption schemes which resist keyword guessing attacks, In terms of communication overhead, the size of index and trapdoor in SDKSE-KGA scheme is not affected by the number of files.Table 2 shows our advantages between this scheme and others.In this table, G represents a member of the group, P airing means a bilinear pair operation, Exp means Fig 1 shows the system model.

Game 2 :
(T rapdoor − IN D − CKA security) Setup.The challenger runs Setup algorithm to obtain the public parameters and the master secret key.He retains the master secret key and gives the public parameters to the adversary B. Query phase 1.The adversary B adaptively selects keyword w to issue.The challenger generates ET d for w and sends it to B. Challenge.The adversary B selects target keywords w * 0 and w * 1 .Both of two target keywords has not queried before.Then, the challenger flips a coin β ∈ {0, 1}, generates the ET d for w * β and sends it to B. Query Phase 2. Repeat Query Phase 1.The adversary continue to issue keywords except the target keywords w * 0 and w * 1 .Guess.The adversary gives β as the guess of β, if β = β, then the adversary wins.

Game 3 :
(Adaptive KGA security) Setup.The challenger runs this algorithm to obtain the public parameters and the master secret key.Then he retains the master secret key and gives the public parameters to the adversary C. Query phase 1.The adversary C queries the fuzzy trapdoor and fuzzy index of any keyword.Challenge.The adversary selects the keyword w * 0 and w * 1 as challenge keywords, and neither keyword has been quried before.Then the challenger randomly selects the keyword w * β (β ∈ {0, 1}), generates ciphertext F T d w * β for it, and sends the trapdoor to C. Query Phase 2. Repeat Query Phase 1.The adversary C continue to query the fuzzy trapdoor and fuzzy index of keywords except the target keywords w * 0 and w * 1 .Guess.The adversary gives β as the guess of β, if β = β, then the adversary wins.
., I h ] represents the location of keyword in the tree.The exact index and fuzzy index of keywords are respectively represented by ExactIndex and F uzzyIndex.The exact trapdoor and fuzzy trapdoor of keywords are respectively represented by ET d and F T d.Definition 3.(SDKSE − KGA) A securely dynamic keyword searchable encryption scheme which resists keyword guessing attacks is a tuple of nine polynomialtime algorithms SDKSE = (Setup, Encrypt, T DGen, F uzzySearch, ExactSearch, KW Insert, IndexInsert, KW Delete, IndexDelete) such that -Setup(λ, N ) → (params, M SK) : In this algorithm, input the security parameter λ and the number of keywords N , generate keyword tree T , output public parameters of the scheme params and master secret key M SK. -Encrypt(params, w) → (F uzzyIndex, ExactIndex) : In this algorithm, input params and keyword w.Generate fuzzy index F uzzyIndex and exact index ExactIndex for w. -T DGen(M SK, w) → (F T d, ET d) : In this algorithm, generate fuzzy trapdoor F T d and exact trapdoor ET d for keyword w by M SK. -F uzzySearch(F uzzyIndex, F T d) → (F uzzyCipher or ⊥) : In this algorithm, input fuzzy index F uzzyIndex and fuzzy trapdoor F T d to match.If the match operation is successful, add these files associated with F uzzyIndex to the fuzzy ciphertext set F uzzyCipher.If the operation is failed, output ⊥. -ExactSearch(ExactIndex, ET d) → (C or ⊥) : In this algorithm, input exact index ExactIndex and exact trapdoor ET d for operation.If the operation is successful, output file set which contain keyword w.If the operation is failed, output ⊥. -KW Insert(w) : Insert new keyword w to the tree T .-IndexInsert(w) : Notify related files to update keyword list and generate encrypted keyword C for new keyword w. -KW Delete(w) : Disable node bound to keyword w from tree T .

Fig 2 .
Now avlS > minS, so we continue to add keywords.

Theorem 2 .
Our SDKSE − KGA scheme is T rapdoor − IN D − CKA secure.Proof.In Game 2, the adversary selects target keywords w 0 and w 1 , then receives ET d w * β from the challenger.As we all known, ET d w = [T d 1 , T d 2 , T d 3 , T d 4 , T d 5 , T d 6 ] where T d 1

1 3 ∼ R 6 3
are random elements selected from G 4 p3 .So the adversary only infer the value of β from T d 2 or T d 5 .According to the property of bilinear pairing, R 2 3 in T d 2 can be removed by elements of G pi ,i ∈ [1, 2, 4].The location strings [I 1 , . . ., I h ] of w 0 and w 1 are known to the adversary, he is able to compute m 0 = y h i=1 u

Fig. 3 :
Fig. 3: Index Size and SDKSE-KGA and Fig 3 shows the comparison of the index sizes of different schemes.
L−2 .Each leaf node has three states: disable, occupied, available.They are represented by [0, 1, 2] respectively.Disable state means this leaf node is bound to one disable keyword.Occupied state means this leaf node is bound to one keyword.Available state means this leaf node has not been bound.

Table 1 :
Comparisons with dynamic searchable schemes power operation while M ul means multiplication operation.n is the number of all files.

Table 2 :
Comparisons with schemes resisting KGA In this paper, we proposed a secure dynamic searchable encryption scheme SDKSE − KGA which resists keyword guessing attacks for mail systems.The complexity of the index and the trapdoor of SDKSE − KGA are both constant size.Therefore, SDKSE − KGA is capable of supporting dynamic management of mails and keywords and resisting keyword guessing attacks.In addition, it is both Index − IN D − CKA and T rapdoor − IN D − CKA secure.This work was supported in part by the National Natural Science Foundation of China (Grant No.61632012, 61672239, 61602180.and U1509219), in part by Natural Science Foundation of Shanghai (Grant No. 16ZR1409200), and in part by "the Fundamental Research Funds for the Central Universities".Zhenfu Cao and Jiachen Shen are the corresponding authors.