An Efficient Construction of a Compression Function for Cryptographic Hash

. A cryptographic hash ( CH ) is an algorithm that invokes an arbitrary domain of the message and returns ﬁxed size of an output. The numbers of application of cryptographic hash are enormous such as message integrity, pass-word veriﬁcation, and pseudorandom generation. Furthermore, the CH is an efﬁ-cient primitive of security solution for IoT-end devices, constrained devices, and RfID. The construction of the CH depends on a compression function, where the compression function is constructed through a scratch or blockcipher. Generally, the blockcipher based cryptographic hash is more applicable than the scratch based hash because of direct implementation of blockcipher rather than encryption function. Though there are many ( n, 2 n ) blockcipher based compression functions, but most of the prominent schemes such as MR, Weimar, Hirose, Tandem, Abreast, Nandi, and ISA09 are focused for rigorous security bound rather than efﬁciency. Therefore, a more efﬁcient construction of blockcipher based compression function is proposed, where it provides higher efﬁciency-rate including a satisfactory collision security bound. The efﬁciency-rate ( r ) of the proposed scheme is r ≈ 1 . Furthermore, the collision security is bounded by q = 2 125 . 84 ( q = numer of query ) . Moreover, the proposed construction requires two calls of blockcipher under single iteration of encryption. Additionally, it has double key scheduling and it’s operational mode is parallel.


Introduction
primitive of security solution for IoT-end device, RfID, and resource constrained device [35][36][37][38][39]44]. Usually, the internal construction of CH depends on compression function [16,17]. The compression function is based on scratch or blockcipher [6,8,16,17,31]. The blockcipher based compression function is a combination of component functions (Fig. 1). The component functions depend on the 16 modes of PGV construction so far [8,16,17]. Additionally, a classical structure of Merkle Damgrad is used for message encryption of the cryptographic hash, if message size is bigger than the blocksize [1][2][3]. According to Fig. 1, message (M ) is multiple of blocklength. Hence, message is partitioned as M |m i=1 ||. . .||m l . Thereafter, partitioned message injects as input with initial vector value (IV ). The function F i is called compression function, which is built by blockcipher or scratch. Usually, one of the PGV modes needs to select as a component function of compression function [8,16,17]. On the contrary, the generic of blockcipher compression function is more suitable than that of the scratch for encryption of a constrained device, IoT-end device because of implementation of blockcipher rather than the encryption function [6,13,14].   [2,6,8,34] Usually, the blockcipher based compression function is classified as single blocklength (SBL) and double block-length (DBL). Due to short size of output, the application of SBL is limited now [2,9,33]. On the other hand, the DBL is more reliable construction due to its better resistance against birthday attack [2,13,16,18,21]. Moreover, the DBL is categorized as (n, n) and (n, 2n) blockcipher (base is key size). The (n, 2n) blockcipher is better due to upper security bound (larger key space) [6,8,13,20,23]. Generally, there are certain parameters that indicate the strength of blockcipher based compression function such as: -security bound (CR : collision and P R : preimage resistance) -efficiency-rate (r) -number of calling blockcipher (#E) 3 -key scheduling (KS) -operational mode (OM ) The CR is defined as a game, where an adversary tries to find similar output under two different input, but the advantage of adversary is very limited [6,13,21]. Under P R, it is infeasible for adversary to find any m (message) such that y = F (m), where y is predefined by the adversary [2,6,16]. The number of blockcipher (#E) depends on number of calling blockcipher per message-block encryption. The KS directs the number of key requirement for single message block encryption [16]. Furthermore, the OM stands for operational mode (parallel or serial) [17,18]. In addition, the efficiencyrate [6,15] is defined as: r = size of message block/per iteration (number of blockcipher call) × block-length Motivation. The parameters of CR, P R, r, #E, OM , and KS are vital for any satisfactory scheme of blockcipher based compression function [1, 6-8, 13, 21]. Firstly, certain gaps are identified from the current familiar schemes based on the above parameters. Thus, the importance of the findings are shown in the field of efficient and secure communication. For example, the key scheduling cost is analysed in respect of construction of compression function. Usually, 176 bytes are needed for operating of single key scheduling [27]. Hence, minimization of key scheduling is a common practice. Additionally, the operation mode is very crucial for resource limited devices, where the parallel mode can provide maximum support in respect of memory system [29,30]. Moreover, the efficiency-rate needs to reach the landmark (r = 1) [6,13,15,21]. There are some well-known schemes of blockcipher compression function such as MR, Weimar, Hirose, Tandem, Abreast, Nandi, and ISA09 ( Table 1). For example, the CR of MR scheme is bounded by q = 2 126.70 but the r is 1/2 (q : number of queries). The scheme of Weimar-DM provides tight security bound such as q = 2 126.23 [6]. Moreover, it follows double key scheduling including 1/2 efficiency-rate. The scheme of Hirose delivers marginal security bound as q = 2 124.55 but it ensures a single key scheduling. However, the CR and P R bound of the Tandem-DM and Abreast-DM are 4 not satisfactory as that of the MR, Weimar, and Hirose [23]. Moreover, the efficiencyrate of Tandem-DM and Abreast-DM is 1/2 like MR, Weimar, and Hirose [6,11,12]. Though the scheme of Nandi is bounded by q = O 2 2n/3 but it provides higher efficiency-rate (r = 2/3) [20]. Additionally, the construction of ISA09 provides better efficiency-rate (r = 2/3) [21]. According to the above discussions and Table 1, most of the existing schemes have rigorous security margin. However, the efficiencies are low for the constructions of MR, Weimar, Hirose, Tandem and Abreast. On the other hand, the schemes of Nandi and ISA09 satisfies higher efficiency-rate. Moreover, the constructions of Nandi and ISA09 satisfies KS = 3 and #E = 3 [20,21]. On the contrary, the OM is serial for Nandi and ISA09 schemes. Thus, the overall efficiencies are not adequate for the ISA09 and Nandi schemes. Now-a-days, the importance of an efficient blockcipher compression function are enormous [6,8,13,33,34,40,41,44]. The blockcipher is one of the important cryptographic primitive for the security solution of IoT environment according to certain standards such as ISO/IEC29192-1, ISO/IEC29192-2, ISO/IEC29192-3, and ISO/IEC29192-4, [42][43][44]. Generally, IoT-end device, RfID, and constrained device are used in IoT environment [39][40][41][42]. Furthermore, these devices need to operate fast but the major draw-backs are limited memory, power, and processor [37,38,[42][43][44]. Therefore, the cryptographic solution scheme should satisfies the property of better efficiency. In summary, the targets for an efficient blockcipher compression function are as follows: -higher efficiency-rate -reasonable key scheduling -less number of calling blockcipher (#E) -operational mode -satisfiable security bound In addition, a comparative study of the proposed construction and current familiar schemes is given through Table 2.
Outline. The basic preliminaries are provided in Section 2. The technical details of the proposed scheme are given in Section 3. Section 4 is responsible for the analysis of security bound. Furthermore, the result analysis is given including performance analysis in section 5. Finally, the conclusions and future works are provided in Section 6. 5 Table 2. Comparison: The proposed scheme and existing familiar schemes [6,14,15,20,21,23] CR r KS #E OM MR 2 126.70 r = 0.5 1 2 P Weimar 2 126.23 r = 0.5 2 2 P Hirose 2 124.55 r = 0.5 1 2 P Tandem 2 120.87 r = 0.5 2 2 P Abreast 2 124.42 r = 0.5 2 2 P proposed scheme 2 125.84 r = 0.996 2 2 P In ideal cipher model, a blockcipher is defined as B (n, k) where n means block-length and k means key-length. The operation of The reply of forward (E) and backward E −1 query is random and independent permutation of K ∈ {0, 1} k . Let BLOCK k n is the set of all blockciphers B (n, k). Under ideal cipher model, E is chosen randomly from BLOCK k n . Actually, E invokes key and plaintext as input and returns ciphertext as output. On the contrary, input of E −1 are key and ciphertext. Then output is plaintext. Usually, the query and response through E and E −1 are stored as k i , x i , y i . Moreover, the adversary is not allowed to make any duplicate query [17,22].

security definition
There are certain properties, which are responsible for analysing the security issue of blockcipher compression function. For example, collision resistance (CR), preimage resistance (P R), padding oracle attack, and initial value (CV ) attack are the most familiar properties [6,13,23,24]. In this section, the collision and preimage resistance of the blockcipher compression function are briefly discussed [16][17][18][19] .
collision resistance of compression function The adversary A is allowed for accessing to the blockcipher oracle E ∈ BLOCK k n . Hence, the output of compression function are (α 1 , β 1 , m 1 ) and (α 2 , β 2 , m 2 ). Furthermore, an experiment is defined as Exp-coll f E (A). The output of the experiment is 1 iff following condition satisfies.
where f E is a blockcipher compression function and α, β are chaining values including m| message. The advantage of adversary for finding a collision under where coll stands for collision. The advantage of adversary A is quantified by the number of queries that are allowed to ask blockcipher oracle. Therefore, Adv coll where the maximum is taken over all adversaries that ask at most q oracle queries [16,19].
preimage resistance of compression function The adversary A has access on blockcipher oracle E ∈ BLOCK k n . Furthermore, A selects value of α, β randomly before making any query to blockcipher oracle. Let the feedback of oracle are α and β in respect of adversarial query. In addition, assume an experiment Exp-pre f E (A), where pre stands for preimage. Hence, the output of the defined experiment is 1 iff: , where f E is a blockcipher compression function and α 1 , β 1 are chaining values including m| message. The advantage of adversary for finding a preimage under . Moreover, the advantage of A is evaluated through the total number of queries. Therefore, Adv pre , where the maximum is taken over all adversaries that ask q oracle queries [16,19].

Proposed Scheme
Usually, the efficiency-rate can be increased by using three calls of blockcipher. The above method is used in Nandi and ISA09 [20,21]. Furthermore, a method of using a pair of chaining values including message in the two blockciphers is also useful. Such kind of method is used in MDC-2 and later in MDC-4 [4,9,45]. The proposed construction is actually inspired and followed by the construction of MDC-2 and MDC-4 [4,9,45]. However, in respect of security there is a drawback for these (MDC-2, 4) kind of construction. In MDC-2, two chaining values are used as input, where message is common for two blockciphers. There is no dependency between two chaining values as input. On the contrary, it can be said that the computations of the two block ciphers used in the compression function are completely isolated. For example, given the input and output (x 1 , y 1 → x 2 , y 2 ) , if the input is swapped then the new output will be swapped values of the old output (y 1 , x 1 → y 2 , x 2 ). It actually suffers for symmetric property. Therefore, certain changes are occurred in the proposed construction (Fig. 2). For example, one constant bit 0 and 1 is used to each of the block ciphers as part of the key for the proposed scheme (trivial practice in cryptography, [14]). Hence, the attacker can't predict the output of the chaining values which is given under the assumption where the attacker can freely alter the input of chaining values and message. This premise is used for breaking the symmetric property of the proposed scheme, where x||y and y||x will be treated as two different values. Moreover, the scheme is secured under a generic attack because of the ideal cipher model primitive [26]. Additionally, the MDC-2, MDC-4 are (n, n)-bit DBL hash functions with efficiency-rate 1/2 and 1/4 [24], 7 where the proposed scheme is based on (n, 2n) blockcipher. Furthermore, a different component function is used in respect of the MDC-2 and MDC-4. The proposed scheme can compress 4n bits into 2n bits, where MDC-2 and MDC-4 can compress 3n bits to 2n bits. Furthermore, the proposed scheme satisfies type-1 (from Stam's conjecture), where two blockciphers E l , E r are distinct and independent under the ICM [8,16]. In general, the proposed scheme is defined as variant of the MDC-2 and MDC-4. Definition 1. Let E ∈ BLOCK k n be a block cipher taking a set of k-bit key and n-bit block-length such that E l, 2n is defined as a double block length (dbl) cipher and parallel calling of two independent blockciphers of E l, r such that, where parameters are defined as . Therefore, f E consists of ideal blockcipher (E) such as:

Security Analysis
The security proof of the proposed scheme follows an ICM [16,17], where A is not allowed to make any duplicate query. For example, the query of E (k, x) = y isn't being executed by the adversary, if E −1 (k, y) = x query is already in the query storage (Q). The adversary A searches for a collision under a pair of different inputs (query) through the blockcipher oracle. Additionally, A tries to find an output of compression function for making collision with initial chaining value. Moreover, the preimage attack means: Adversary A selects α , β randomly and tries to find f (α, β, m) = α , β . In addition, the advantage of A is very limited to get the above success.

collision security analysis
An adversary A has access to a blockcipher oracle for finding a collision. The query is Q i and corresponding response is triplet as (m : mesage, k : key, c : ciphertext). For any i-th iteration (i ≤ q), the query process looks either Q i ∈ {(m, k) = c} or Q i ∈ {(c, k) = m}. The Q i stores in Q ∈ (Q 1 , Q 2 , ..., Q i ) for each iteration of i where Q : query storage. Under this circumstance, adversary A has target to find, According to the definition of proposed scheme, 1 is re-defined as: Theorem 1. Let f E be a double block-length compression function (Def. 1, 2). An adversary, A is assigned for finding a collision (coll) under the f E after q pairs of queries. Hence, the advantage of A is bounded by, Proof. An adversary A makes a relevant query to the blockcipher oracle, where the number of query is limited by q queries. For any i-th query, the reply of x i and y i randomly selects by the adversary from the blockcipher oracle. The main difficulty is to find out the set size of an oracle from where these fresh value come. There are three possible incidents that are responsible for collision-hit under any i-th iteration. In the beginning, the three incidents are clarified through two targets (T AR1, T AR2). The goal of the first incident is to find a collision for two distinct queries (j < i) where T AR1 represents the responsibilities of the first incident. The T AR2 is responsible for second and third incident. Since A has target to find a collision through single query. Furthermore, A investigates for a collision against initial chaining values.
if (q i,1 , q i,2 )=(q j,1 , q j,2 ), where j < i then end if 12: end for T AR1 and T AR2. Let adversary A is allowed to ask query to blockcipher oracle at QUERY phase. Moreover, corresponding feedback assign under RESPON SE phase. In addition, a collision is checked in the phase of CHECK.
collision probability based on the first incident (T AR1). Under an iteration of i, a pair of query is executed that returns two distinct outputs. According to algorithm 1, there is a chance to make collision through two different query-pairs after any i-th (j < i < q) iteration. For example, a query pair of j-th iteration are: Moreover, the query responses are a i ← E l,m||c a i−1 ⊕ l (m i ) ⊕ (a i−1 ⊕ l (m i )) ⊕ c and b i ← E r,m||c (a i−1 ⊕ l (m i )) ⊕ (a i−1 ⊕ l (m i )) ⊕c on the i-th (j < i) iteration. Let T AR1 C i be an event, where adversary tries to find a collision through different two iterations (j < i ≤ q). Thus, equation 2 is re-defined as: From 3 ∧ 4, the probability of collision hit under the event of T AR1 C i is 2(i−1) (when j < i ≤ q). Therefore, the probability of single event under the T AR1 is: If T AR1 C be the events of all colliding pairs under the f E for q pairs of queries. Hence, collision probability based on the second and third incident (T AR2). Let a i , b i be the output of compression function (i < q), where Hence, there is a probability to make collision when a i = b i . Let T AR2 C i be a collision event for the above condition under the check phase of i < q. Furthermore, there is an option to make a collision with initial chaining values. For example, the output pair of the proposed scheme a i , b i collides with the initial chaining values (a 0 , b 0 ) at any phase of query process. Therefore, the conditions of collision-hit under the initial key attack are create the event T AR2 C i ∧ terminate 4: "AND" 5: if (q i,1 , q i,2 ) = (q 0,1 , q 0,2 ) then 6: create the event T AR2 C i ∧ terminate 7: Hence, the probability of collision under two incidents is at most 1/(2 n − i) × 2 × 2/(2 n − i). Finally, the probability of these two incidents under the event of T AR2 C for q pairs of queries is: Adding the values of 5 and 6, Theorem 1 satisfies.

Preimage Security Analysis
A standard proof technique of Armknecht et al. is used for the preimgae security proof of the proposed scheme [14]. The P R security bound of MR, Weimar, Hirose, Tandem and Abreast is also based on [14]. The two important concepts are adopted such as query: super, normal and adjacent query-pair from [6,14]. Let A randomly picks the output value of compression function (a , b ). Now A has target to find a probability for preimage-hit through f p E (a i , b i , m) = (a , b ) condition, where a i , b i , m : input of compression function and a i = b i .
Theorem 2. Let f E be a double block-length compression function. An adversary A is defined for finding a preimage-hit under the f E after q pairs of queries. Hence, the advantage of A is bounded by, Proof. An adversary A keeps a query database in the form of, In such a fashion, when the oracle size reaches N/2 (N : Oracle size (2 n )), the rest of the queries under the key-set reaches the adversary as free query [6,14,25]. This free set of queries exist in the domain which is called the super query database (SQD).
On the other hand, the first N/2 is defined as a normal query database (N QD) [14]. Additionally, the free queries are asked by the adversary non-adaptively in the super query database (SQD). Therefore the successful conditions of a preimage-hit are: for N/2 < i < N do for super query 6: (QUERY ∧ CHECK) 7: end for 8: end procedure probability of N QW. The adversary A makes any relevant query independently and receives a i , b i . Furthermore, A executes until the oracle set size reaches to N/2 [6,14]. According to the above mentioned conditions (7,8), the hitting probability is 2 × 2 (2 n − q). 12 If A makes a query E l,mi||c a i−1 ⊕ l (m i ) (left block) then the answer of a right block provides as free query to A because of the adjacent query pair [6,14]. Thereafter, the set size is (2 n − q)/2 which outfits the probability as 2 (2 n − q). Thus, the probability of the normal query is: probability of SQW . The concept of a super query oracle is very simple [6,14]. If the query oracle reaches at the point of N/2, then the rest of the queries set as free to the adversary [6,14]. Later these queries are asked by the adversary non-adaptively [14] for finding a preimage-hit (Algorithm 3). Moreover, the preimage-hit is notified either in this domain (SQD) or not. Thus, the probability is either 2/N or 0 for any output value of a i /b i . Now a pair of conditions under SQW are: According to 10, the answer of a i has a possibility to come from the set size of N/2. Hence, the probability is 2/N . Recalling the concept of an adjacent query pair (free query) [6,14], where the answer of another block (right block) comes from the set size of N/2. As a result, the probability of 10 is in total 4/N 2 . In similar way, the probability of 11 is 4/N 2 . Now, the final probability of the SQW is evaluated based on the the number of points for a SQW, the cost of SQW and the probability of obtaining preimgae-hit such as: Adding the values of 9 and 12, Theorem 2 satisfies.

collision resistance analysis
Theorem 1 provides a probability of collision hit under the given adversary A. The number of queries (q) is important for finding an upper bound of the collision security. Hence, the value of q is required to investigate when the adversarial advantage is 1/2 (birthday attack). Let, N = 2 n and Adv coll where n = 128. According to the birthday attack [1,6,13,20,21], Adv coll f E (A) = 1 2 . Thus, the number of queries are q = 2 125.84 .

Efficiency-rate
The efficiency-rate of a blockcipher based compression function is defined as r = |m| (n×#E) , where |m|=length of message, n=blocklength and #E=number of blockcipher calls. According to the definitions (Def. 1, Def.2) of the proposed scheme, the efficiency-rate is r = 0.996 ⇒ r ≈ 1. In Fig. 3, the proposed scheme is compared with the existing schemes in respect of efficiency-rate .

Performance analysis
In this section, a comparison study is given for the proposed scheme in respect of memory resources. It is known that 176 bytes of memory is required for single key scheduling [27]. For example, a 2n-bit size of message is taken for encryption. Therefore, the following Table 3 and 4 are made based on the characteristics of the current familiar schemes and the proposed scheme. For any DBL compression function, the output is 2n-bit. Therefore, assume that the minimum 2n → γ bit is required to store the output value (denoted as V) of i-th iteration. In Table 4, the message size is 2n-bit for example. Hence, the memory resource doesn't need to store the output for the proposed scheme. 14 Next, the above cost ( Table 4) is generalized including the number of iterations (l) for tn-bit message (t > 2) in Table 5. Additionally, the proposed scheme is faster than that of the MR, Weimar, Tandem, Abreast (if, m > 2n) in certain cases. Table 5. Required memory for key scheduling, when m = tn Name l V B + V Proposed scheme l = tn/2n γ a + γ MR l = tn/n γ b + γ Weimar l = tn/n γ c + γ Hirose l = tn/n γ d + γ Tandem l = tn/n γ e + γ Abreast l = tn/n γ f + γ a, b, c, d, e, f: these values come from the

Conclusion
This paper studied the gap between security bound and efficiency of compression function for the cryptographic hash. Additionally, study result introduces that the blockcipher based compression function is more suitable than the scratch based construction for security solution of IoT-end devices, RfID, and constrained devices. Thus, a better efficient compression function (blockcipher based) is proposed in this paper. Additionally, the proposed scheme provides improved efficiency-rate, less call of blockcipher, and reasonable security bound. It satisfies two calls of 2n-bit key property, where two block ciphers are independent. The proof technique of this scheme depends on the ICM tool. The proposed scheme has a provision of fixed size message encryption property. Therefore, this property opens a window for new applications, where a variable length of the message can be encrypted without padding. Finally, the proposed scheme is secure under one of the modes of PGV which can be extended to make the scheme secure under all modes of the PGV [17][18][19].