Anomalies and Vector Space Search: Tools for S-Box Analysis (Full Version)

. S-boxes are functions with an input so small that the simplest way to specify them is their lookup table (LUT). How can we quantify the distance between the behavior of a given S-box and that of an S-box picked uniformly at random? To answer this question, we introduce various “anomalies”. These real numbers are such that a property with an anomaly equal to 𝑎 should be found roughly once in a set of 2 𝑎 random S-boxes. First, we present statistical anomalies based on the distribution of the coefficients in the difference distribution table, linear approximation table, and for the first time, the boomerang connectivity table. We then count the number of S-boxes that have block-cipher like structures to estimate the anomaly associated to those. In order to recover these structures, we show that the most general tool for decomposing S-boxes is an algorithm efficiently listing all the vector spaces of a given dimension contained in a given set, and we present such an algorithm. Finally, we propose general methods to formally quantify the complexity of any S-box. It relies on the production of the smallest program evaluating it and on combinatorial arguments. Combining these approaches, we conclude that all permutations that are actually picked uniformly at random always have essentially the same cryptographic properties and the same lack of structure. These conclusions show that multiple claims made by the designers of the latest Russian standards are factually incorrect.


Introduction
S-boxes are small functions with an input small enough that they can be specified by their lookup tables.If  is an S-box with an -bit input then it is feasible to describe it using only the sequence  (0),  (1), ...,  (2  − 1) since, in the vast majority of the cases, 3 ≤  ≤ 8. S-boxes can therefore correspond to arbitrarily complex functions.In practice, such components are the only source of non-linearity of many symmetric primitives.Most prominently, the AES [AES01] uses an 8-bit bijective S-box.
However, because they can be specified using only their lookup tables, it is not necessary for algorithm designers to disclose their design process.They can build an S-box using a secret structure and then hide this structure by only disclosing the lookup table.Such an action is considered bad practice as it cannot foster the trust necessary to use the algorithm so specified.And yet, there are several instances of standardized algorithms that use secretly designed S-boxes: the DES [DES77], Skipjack [U.S98], and the pair consisting of the hash function Streebog [Fed12] and the block cipher Kuznyechik [Fed15].The DES and Skipjack were American standards while Streebog and Kuznyechik are Russian ones.Streebog is part of the standard ISO/IEC 10118-3 while Kuznyechik is being considered for inclusion in ISO/IEC 18033-3.
The generation method used by the designers of the S-box shared by Streebog and Kuznyechik then had to be recovered by external cryptographers who, after several attempts [BPU16,PU16], succeeded in [Per19].The structure presented in this last paper is extremely rare: the probability that a random permutation has a similar one is under 2 −1601 .Yet, in an internal memo sent to ISO [SM18] before the publication of [Per19], the designers of Kuznyechik stated the following.
Through thorough search current S-box was obtained [...] No secret structure was enforced during construction of the S-box.At the same time, it is obvious that for any transformation a lot of representations are possible (see, for example, a lot of AES S-box representations).
In this paper, we prove that none of these statements are correct:1 it would be necessary to generate an infeasibly large set of S-boxes to obtain one with similar differential, linear, and boomerang properties.At the same time, the structure found had to be inserted deliberately by its designers because the presence of any structure this simple in a random permutation is extremely unlikely.
So far, S-box reverse-engineering has dealt with two broad questions.Let S 2  be the set of all -bit permutations and let  ∈ S 2  .1. What is the probability that an S-box picked uniformly in S 2  has differential/linear properties at least as good as those of  ?
2. How can we recover the structure of  -if it has any?
Answering the first question can also help us better understand the properties of random permutations and thus to better estimate the advantage of an adversary trying to distinguish a (round-reduced) block cipher from a random permutation.On the other hand, the second one is related to so-called white-box cryptography, i.e. to implementation techniques that will hide a secret from an attacker with a total access to the implementation of the algorithm.In practice, in order to try and hide for instance an AES key, the attacker will only be given access to an implementation relying on big lookup tables that hide the details of the computations.Recovering the original structure of these tables can be seen as a particular case of S-box reverse-engineering.
Overall, this second question is more subtle than it may seem: a given function can have multiple different decompositions as evidenced by the multiple results on the Russian S-box [BPU16,PU16,Per19].We then ask a third natural question whose answer will allow us to both disprove a claim of [SM18], and to estimate if a random permutation can be hoped to be efficiently implemented.
3. Can we expect a random permutation to have a simple description?

Our Contributions
A Key Concept: Anomalies.We answer the two questions asked above using different variants of a unique approach based on what we call anomalies.Intuitively, an anomaly is a real number that quantifies how unlikely a property is.For example, there are very few differentially-6 uniform 8-bit permutations,2 meaning that the anomaly of this property should be high.However, we could argue that what matters in this case is not just the number of differentially-6 uniform permutations but the number of permutations with a differential uniformity at most equal to 6.In light of this, we define anomalies as follows.
Definition 1 (Anomaly).Let  ∈ S 2  and let  be a function mapping S 2  to a partially ordered set.The anomaly of  ( ) is defined as A ( ( )) = − log 2 (Pr [ () ≤  ( )]), where the probability is taken over  ∈ S 2  .We can equivalently write .
In the example given above,  is simply the function returning the differential uniformity of a permutation.The anomaly of the differential uniformity then gets higher as the differential uniformity of  decreases under the median differential uniformity as there are fewer permutations with a low differential uniformity.At the same time, the negative anomaly of the differential uniformity increases as the differential uniformity increases above its median value.To put it differently, the anomaly of  ( ) quantifies how many S-boxes are at least as good3 as  in terms of  , and the negative one how many are at least as bad as  .In this paper, we study different anomalies and design new tools that allow their estimation for any S-box.
A property with a high anomaly can be seen as distinguisher in the usual sense, i.e. it is a property that differentiates the object studied from one picked uniformly at random.However, unlike usual distinguishers, we do not care about the amount of data needed to estimate the probabilities corresponding to the anomalies.Statistical Anomalies.In [BP15] and [Per19], the notions of "differential" and "linear" anomalies were introduced.Definition 1 is indeed a generalization of them.They are based on properties  that correspond to how good the differential and linear properties are.In Section 2, we generalize this analysis to take into account the corresponding negative anomalies, and we introduce the use of the so-called Boomerang Connectivity Table (BCT) [CHP + 18] for this purpose.To this end, we establish the distribution of the coefficients of the BCT of a random permutation.As an added bonus, this new result allows a better estimation of the advantage of an adversary in a boomerang attack.

Structural Anomalies.
Anomalies can also be related to the presence of a structure.For example, for -bit Boolean functions, the existence of a simple circuit evaluating a function is unlikely: "almost all functions" of  arguments have "an almost identical" complexity which is asymptotically equal to the complexity of the most complex function of  arguments.This statement of Lupanov [Lup73] summarizes the so-called Shannon effect [Sha49].In other words, the existence of a short description is an unlikely event for a Boolean function.
Here, we generalize this observation to permutations of F  2 and construct anomalies that capture how "structured" an S-box is.
In Section 3, we present an estimation of the number of permutations that can be constructed using common S-box generation methods (multiplicative inverse, Feistel networks...) and derive the corresponding anomalies.In order to identify these anomalies, it is necessary to recover said structures when they are unknown.We present a simple approach applicable to inversion-based S-boxes that we successfully apply to the 8-bit S-box of the leaked German cipher Chiasmus.In other cases, we show that the detection of structures with a high anomaly can be performed using a vector space search.
Vector Space Search.We provide an efficient algorithm performing this search: given a set  of elements of {0, 1}  and an integer , this algorithm returns all the vector spaces of dimension  that are fully contained in .We present it in Section 4. While such an algorithm is needed when looking for a structure in an S-box, we expect it to find applications beyond this area.
Kolmogorov Anomalies.The anomalies we present in Section 3 are related to specific structures that are very common, but they correspond to functions  with a binary output: an S-box has the specific structure considered or it does not.It thus fails to capture the idea behind anomalies which consists in looking at the probability that an event or a "better" version of it occurs.To solve this problem, we take inspiration from both a proof of Shannon [Sha49] and the Kolmogorov complexity to define an anomaly that quantifies how simple an implementation of a function is that can be applied regardless of the specifics of the structure considered.
Application.We apply the different methods we present to the S-box of the Russian algorithms, .We show that its statistical, structural, and Kolmogorov anomalies are either high or extremely high; thus disproving the claims of its designers.

Mathematical Background
Boolean Functions.Let F 2 = {0, 1}.In what follows, we consider the following subsets of the set of all functions mapping F  2 to itself.
• Recall that the set of all -bit permutations is denoted S 2  .It contains 2  !elements.The compositional inverse of  ∈ S 2  is denoted  −1 .
• The set of all -bit linear permutations is denoted For elements of F  2 , "+" denotes the characteristic-2 addition, i.e. the XOR.In cases that might be ambiguous, we use "⊕" to denote this operation.
Let  ∈ S 2  be an S-box.Many of its cryptographic properties can be described using 2  × 2  tables: the LAT, DDT and BCT.They are defined below. The Its maximum for  ̸ = 0 is the linearity of  and is denoted ℓ( ).The LAT is used to study linear cryptanalysis [TG92,Mat94].The set of the coordinates of the coefficients equal to 0 plays a special role, as shown in [CP19].It is called the Walsh zeroes of  and is denoted Its maximum for  ̸ = 0 is the differential uniformity of  and is denoted ( ).The DDT is needed to study differential cryptanalysis [BS91].

Recently, Cid et al. introduced a new tool which they called Boomerang Connectivity Table (BCT) [CHP
Its maximum value for ,  ̸ = 0 is the boomerang uniformity of  and is denoted   .As hinted by its name, the BCT is relevant when studying boomerang attacks [Wag99].Unlike the DDT and LAT, it is necessary that  is a permutation for the BCT to be well defined.
Statistics.Some of our results rely on both the binomial and Poisson distribution.We denote with Binomial(, ) the binomial distribution with parameters  and  which correspond respectively to the probability of an event and to the number of trial.It is defined as follows: It has a mean equal to  and a variance of (1 − ).The Poisson distribution with parameter  is defined by The mean value and variance of this distribution are both .A binomial distribution with small  can be closely approximated by a Poisson distribution with  = .

Statistical Properties
Let us consider a permutation  that is picked uniformly at random from S 2  and let us consider one of its tables, i.e. its DDT, LAT or BCT.The coefficients in this table may be connected to one another: for example the sum of the coefficients in a row of the DDT have to sum to 2  .Yet, in practice, the coefficients act like independent and identically distributed random variables.In Section 2.1), we recall what the distributions of the DDT and LAT coefficients are and we establish the distribution of the BCT coefficients.Then, Section 2.2 presents how the knowledge of these distributions can be used to bound the probability that a random permutation has differential/linear/boomerang properties at least as good as those of the S-box investigated.Additionally, we explain in Section 2.3 how our newly gained knowledge of the distribution of the BCT coefficients allows a better estimation of the advantage of the attacker in a boomerang attack.

Coefficient Distributions
In [DR07], the authors established and experimentally verified the distribution followed by the DDT and LAT coefficients.The distribution of the LAT coefficients was first established in [O'C95] and then provided a different expression in [DR07].A more thorough study of the DDT coefficient can be found in [O'C94].We recall these results in the following two theorems.
Proposition 1 (DDT coefficient distribution [DR07]).The coefficients in the DDT of a random S-Box of S 2  with  ≥ 5 are independent and identically distributed random variables following a Poisson distribution Poisson(2 −1 ).

Proposition 2 (LAT coefficient distribution [O'C95, DR07]
).The coefficients in the LAT of a random permutation 4 of S 2  are independent and identically distributed random variables with the following probability distribution: The situation is the same for the BCT.In order to establish the distribution of the non-trivial coefficients of the BCT of a random permutation, we first recall an alternative definition of the BCT that was introduced in [LQSL19].
Proposition 3 (Alternative BCT definition [LQSL19]).Let  ∈ S 2  be a permutation.For any ,  ∈ F  2 , the entry ℬ  (, ) of the BCT of  is given by the number of solutions in F  2 × F  2 of the following system of equations We use this theorem to obtain the distribution of the coefficients in the BCT.
Theorem 1 (BCT coefficient distribution).If  is picked uniformly at random in S 2  , then its coefficients with ,  ̸ = 0 can be modeled like independent and identically distributed random variables with the following distribution: where  1 and  2 are stochastic variable following binomial distributions: and  2 () = Binomial Proof.For any ,  ∈ F  2 such that  ̸ = , we define  , = {(, ), (, ), ( + ,  + ), ( + ,  + )} , which is of cardinality 4 unless  +  = , in which case it only contains 2 elements.These sets are such that a pair (, ) is a solution of System (1) if and only if all the elements in  , are as well.In order to prove this theorem, we will partition the set of all pairs of elements of F  2 into such sets  , .To this end, we consider the following equivalence relation: (, ) ∼ ( ′ ,  ′ ) if and only if the multisets  , and   ′ , ′ are identical.The corresponding equivalence classes are of size 4 except when  +  = , in which case they contain only 2 elements.There are in total 2 −1 classes of size 2. As there are 2  (2  − 1) ordered pairs of elements in F  2 , we deduce that there are Then, in order for System (1) to have exactly  solutions, we need that there exists  1 solutions in classes of size 4 and  2 in classes of size 2, where 2 1 + 4 2 = .We deduce that where  1 ( 1 ) (respectively  2 ( 2 )) is the probability that there exists  1 classes of size 4 (resp.2) that are solutions of System (1).Let us now prove that the distributions of  1 ( 1 ) and  2 ( 2 ) are as stated in the theorem.

Anomalies in Table Coefficients Distributions
Building upon the general approach presented in [BP15], we can define several anomalies using the distribution of the coefficients in the tables of a permutation  ∈ S 2  .We will then be able to estimate the values of the corresponding anomalies using the distributions derived in the previous section.
Maximum Value.For any table, the maximum absolute value of all coefficients is a natural property to use to construct an anomaly as the integers are ordered.Let max  : S 2  → N be the function mapping a permutation  ∈ S 2  to the maximum absolute value of the non-trivial coefficients in a table  .Then we can use the distributions in Propositions 1 and 2 as well as Theorem 1 to estimate the associated anomalies: where   is the probability that  (, ) = .Indeed, there are only (2  − 1) 2 non-trivial coefficients in the DDT, LAT and BCT as the first row and column are fixed in each case.The (negative) anomalies corresponding to the differential uniformity, linearity and boomerang uniformity for  = 8 are given in Appendix B in Tables 4a, 4b and 4c respectively.

Maximum Value and Number of Occurrences.
In S 2 8 , the anomaly of a differential uniformity of 8 is equal to 16.2 but, for a differential uniformity of 6, it is 164.5.In order to have a finer grained estimate of how unlikely the properties of an S-box are, we combine the maximum coefficient in one of its tables with its number of occurrences as was first done in [BP15].For a 2  × 2  table of integers  , let MO be the function such that MO( ) = (, ) where  is the maximum absolute value in  and  is its number of occurrences (where the first row and column are ignored).The set N × N in which the output of MO lives can be ordered using the lexicographic ordering, i.e. (, ) ≤ ( ′ ,  ′ ) if and only if  <  ′ or  =  ′ and  ≤  ′ .We then define the differential, linear and boomerang anomalies of  as respectively This definition of the differential and linear anomalies matches with the one given in [Per19].The boomerang anomaly was not used before.We also introduce the negative differential, linear and boomerang anomalies as the corresponding negative anomalies.
We estimate these anomalies for a table  using the following expression: where   is the probability that  (, ) = ||.For the corresponding negative anomaly, we use the following relation:

Tighter Advantage Estimations for Boomerang Attacks
The coefficient distribution we established in Theorem 1 can also be used to compute the expected value of a BCT coefficient.This in turn implies a better understanding of the advantage an adversary has in a boomerang attack.
Theorem 2. The expected value for each BCT coefficient of a random permutation of S 2  converges towards 2 as  increases.
Proof.Let  ∈ S 2  be picked uniformly at random.The expected value  of where the expression between the brackets is equal to 1 if 2 1 + 4 2 = , and 0 otherwise.
Reordering the sums, we obtain the following expected value: We then approximate the binomial distributions  1 and  2 by Poisson distributions, namely As all sums converge towards 1 as  increases, the limit of () is 2. On the other hand, we remark that  ≤ () because of Equation (2), and that As () converges to 2 as  increases, so does .
The expected probability of a boomerang characteristic is thus 2 1− and not 2 − as we might expect.

Experimental Results
Verification.To check the validity of our approach to estimate the statistical anomalies, we picked 2 21 permutations from S 2 8 uniformly at random.We then counted the number   of permutations  such that ⌊A( )⌋ = , and we obtained the following results (only anomalies above 19 are listed): We deduce that the anomalies other than A d ( ) behave as we expect: in a set of size 2  , we can expect to see about 1 permutation with an anomaly of .However, for A d ( ), our results do not quite match the theory.Indeed, we have found too many permutations with a high differential anomaly for it to be a coincidence: Recall that our estimates of the table-based anomalies rely on the assumption that the coefficients behave like independent random variables.While we experimentally found this assumption to yield accurate models in practice for all tables, it fails to accurately predict the behavior of the maximum value and its number of occurrences in the case of the DDT.S-boxes from the Literature.We computed the statistical anomalies we defined above for several 8-bit S-boxes from the literature that we obtained from [PW17].The results are given in Table 1.We also list the number   of vector spaces of dimension  contained in   ; its importance will appear later in Section 3.
The statistical anomalies of the AES S-box, i.e. of the multiplicative inverse, are unsurprisingly very large.But they are too large: an anomaly cannot be higher than log 2 (|S 2  |).Our estimates do not hold for objects with properties as extreme as those of the inverse.
We can derive other results from this table.For example, 2-round SPNs have a high negative boomerang anomaly but 3-round ones loose this property.Classical 3-round Feistel networks, as used in ZUC_S0, have a boomerang uniformity which is maximum [BPT19b] so it is not surprising to see that they have a boomerang anomaly so high that we could not compute it.Even though the S-box of Zorro has a modified Feistel structure (it uses a sophisticated bit permutation rather than a branch swap), it still has a high negative boomerang anomaly.
As expected, the S-boxes that were generated using a random procedure have low positive and negative statistical anomalies.The S-box of MD2 was obtained using the digits of , that of the newDES from the American declaration of independence, and that of Turing from the string "Alan Turing".
The correlation between the different statistical anomalies seems complex.On the one hand, there are S-boxes with very different linear and differential anomalies despite the fact that the square of the LAT coefficients corresponds to the Fourier transform of the DDT (see e.g.Skipjack).As evidenced by the anomalies of the S-boxes of Kalyna, Table 1: The statistical anomalies and number of vector spaces for some S-boxes from the literature.

Type
Cipher which were obtained using a hill climbing method optimizing the differential and linear properties [KKO13], these improvements lead to an observable increase of the boomerang anomaly but it can be marginal.

Identifying Structures
In this section, we go through the most common S-box structures, and present for each of them the density of the set of such S-boxes (up to affine-equivalence) and the methods that can be used to identify them.In practice, S-boxes operating on at least 6 bits usually fall into two categories: those that are based on the inverse in the finite field F 2  , and those using block cipher structures.
In both cases, the permutations are usually composed with affine permutations.In the context of white-box cryptography, it is common to compose functions with secret affine permutations so as to obfuscate the logic of the operations used.Hence, for both decomposing S-boxes and attacking white-box implementation, it is necessary to be able to remove these affine layers.
While recovering a monomial structure is simple even when it is masked by affine permutations (see Section 3.1 and our results on the S-box of Chiasmus), it is not the case with block cipher structures.In this section, we show how the the recovery of the pattern used in [BPU16] to remove the affine layers of the Russian S-box can be efficiently automatized (Section 3.2), and applied to both SPNs (Section 3.3) and Feistel network (Section 3.4).The core algorithm needed for these attacks is one returning all the vector spaces contained in a set of elements of F  2 .We will present such an algorithm in Section 4. These techniques allow us to identify the structural anomalies in S-boxes.In order to estimate the anomaly associated with each structure, we upper bound the number of permutation that can be built using each of those that we consider.The corresponding anomalies are summarized in Section 3.5.

Multiplicative Inverse
Such permutations have a very simple structure: there exists two affine permutations  : F  2 → F 2  and  : F 2  → F  2 such that the permutations  can be written  = ∘∘, where  is the permutation of F 2  defined by () =  2  −2 .Their use was introduced in [Nyb94]; the AES [AES01] uses such an S-box.
In practice, the implementation of  requires the use of an encoding of the elements of F 2  as elements of F  2 .Usually, it is achieved by mapping  = ( 0 , ...,  −1 ) ∈ F  2 to ∑︀ −1 =0     , where  ∈ F 2  is the root of an irreducible polynomial with coefficients in F 2 of degree .However, this encoding can be seen as being part of  and .
How to recognize them?The Chinese cipher SMS4 [Dt08] uses an 8-bit S-box whose structure was not explained.This prompted Liu et al. to try and recover said structure [LJH + 07].They successfully identified it as being affine equivalent to the multiplicative inverse using an ad hoc method.
There is a simple test that can be applied to check if a permutation is affine-equivalent to the multiplicative inverse when the input/output size is even.
In our case, we have that In [Sch14] and [STW13], two separate teams independently recovered the secret block cipher Chiasmus from an encryption tool called GSTOOL.Chiasmus is a German designed 64-bit block cipher which uses two S-boxes  and  −1 .Schuster had the intuition that it was built similarly to the AES S-box.He was right.Using Lemma 1 and the linear equivalence algorithm of [BDBP03], we found that the S-box of Chiasmus is also based on a finite field inversion.However, unlike in the AES, it uses two affine mappings with non-zero constants.A script generating the S-box of Chiasmus is provided in Appendix E. The S-box itself can be found in a SAGE [Dev17] module [PW17].
We could also have recovered this structure using directly the algorithm of Biryukov et al. [BDBP03] or the more recent one of Dinur [Din18].However, the above approach and these algorithms share the same shortcoming when it comes to identifying the structure in an unknown S-box  ∈ S 2  : if we do not know the exact S-box to which  might be affine-equivalent then they cannot be applied.Even if we know that it might be affine-equivalent to an SPN or a Feistel network, we cannot find the corresponding affine masks.
To solve this problem, we identify patterns in the LAT of the permutations with specific structures that are present regardless of the subfunctions they contain.As a consequence, they can always be detected.

TU-Decomposition
The TU-decomposition is a general structure that was first introduced in [BPU16] where it was shown that the S-box of the latest Russian standards has such a structure.Later, it was encountered again in the context of the Big APN problem, a long standing open question in discrete mathematics.Indeed, the only known solution to this problem is a sporadic 6-bit APN permutation that was found by Dillon et al. [BDMW10] and which was proved in [PUB16] to yield a TU-decomposition.This structure was then further decomposed to obtain the so-called open butterfly.As we will show below, some Feistel and SPN structures also share this decomposition.Thus, the tools that can find TU-decomposition can also be used to identify these structures even in the presence of affine masks.
Definition 2 (TU  -decomposition).Let  and  be integers such that 0 <  < .We say that  ∈ S 2  has a TU  -decomposition5 if there exists: • a family of 2  permutations   ∈ S 2 − indexed by  ∈ F  2 , and • two linear permutations  :

2
) and  : ( . This structure is presented in Figure 1a. In other words,  ∈ S 2  has a TU  -decomposition if and only if it is affine-equivalent to  ∈ S 2  with the following property: if   is the restriction of  to its  bits of highest weight then  ↦ →   (||) is a permutation for all  ∈ F − 2 .Density of the set.In order to define a permutation with a TU  -decomposition, we need to choose 2 − permutations of S 2  , 2  permutations of S 2 − and two linear permutations operating on  bits.However, several of the permutations generated in this way will be identical.Indeed, we can compose each   with a -bit linear permutation  ∈ ℒ 2  to obtain a permutation  ′  =   ∘ .If we use  ′  and compose  with  −1 , then we obtain the same overall permutation as when   and  are used.More equivalent modifications can be made using linear permutations  ∈ ℒ 2 − ,  ∈ ℒ 2  and  ∈ ℒ 2 − , as summarized in Figure 1b.Hence, the total number of -bit permutations with TU  -decompositions is at most  This quantity is only a bound as permutations that are self affine-equivalent lead to identical permutations with different  and .We used this bound to compute the anomaly associated to the presence of a TU  -decomposition in a permutation.It is given in Section 2.
How to recognize them?Let  ∈ S 2  be a permutation.As was established in Proposition 6 of [CP19], the presence of a TU  -decomposition is equivalent to the presence of a specific vector space of zeroes of dimension  in   .Let us first recall the corresponding proposition in the particular case of permutations.

Proposition 4 ([CP19]
).Let  ∈ S 2  and let   be its Walsh zeroes.Then  has a TU  -decomposition without any affine layers if and only if   contains the vector space The advantage of Proposition 4 is that the pattern described depends only on the presence of a TU  -decomposition and not on the specifics of the components  and  .Furthermore, recall that if  =  2 ∘  ∘  1 for some linear permutations  1 and  2 then Corollary 1.Let  ∈ S 2  and let   be its Walsh zeroes.Then  has a TU decomposition with linear permutations  and  if and only if It is therefore sufficient to look for all the vector spaces of dimension  contained in   to see if  has TU  -decomposition.If we find a vector space that is not the Cartesian product of a subspace of {(, 0),  ∈ F  2 } with a subspace of {(0, ),  ∈ F  2 } then  does not have a TU  -decomposition but there exists a linear function  of F  2 such that  +  does [CP19].Regardless, the key tool that allows the search for TU-decomposition is an efficient algorithm returning all the vector spaces of a given dimension that are contained in a set of elements of F  2 .Indeed, finding such vector spaces will allow us to recover all the values of ( −1 )  (0, ) and   (, 0) for (, ) , from which we will deduce information about  and .We present such an algorithm in Section 4 and we used it as a subroutine of program finding a TU  -decomposition automatically (see Appendix C).
As observed in [CP19], the number of vector spaces of dimension  in   is the same as the number of vector spaces of dimension  in the set of the coordinates of the zeroes in the DDT.Thus, we could equivalently present our results in terms of DDT.

Substitution-Permutation Networks
An -bit SPN interleaves the parallel application of  possibly distinct -bit S-boxes with -bit linear permutations, where  ×  = .We use the common [BS01] notation  to denote a linear layer followed by an S-box layer.A  structure is depicted in Figure 2a.Let us estimate the number of -round SPNs.As the S-box layers are interleaved with linear layers, we need to consider not the size of S 2  but instead the number of linear equivalence classes, which is at most The corresponding anomalies for some values of  are given in Section 3.5.
How to recognize them?First of all, the algebraic degree of a 2-round SPN is at most equal to  − 2 [BC13].Hence, if a permutation is of degree  − 1, it cannot have such a structure.
In Theorem 3, we will establish the existence of specific vector space of zeroes in the LAT of a 2-round SPN.However, in order to properly state this theorem, we first need to introduce the following notion.
Definition 3 (-Valid minors).Let ,  and  be integers such that  =  × .Let  ∈ ℒ 2  be a linear permutation.We define it using a  2 block matrices  , of dimension  × : We call a minor of the matrix  -valid if there exists a pair ,  of subsets of {0, ...,  − 1} which are of the same size 0 < || = || <  and such that the rank of  , = [ , ] ∈,∈ is equal to .
In other words, an -valid minor of  is a non-trivial minor of  that is obtained by taking complete -bit chunks of this matrix, and which has maximum rank.Theorem 3. Let  ∈ S 2  be an ASASA structure built using  as its central linear layer and two layers of -bit S-boxes.For each ,  {0, ...,  − 1} defining an -valid minor of , there exists a vector space of zeroes of dimension  in   .
Proof.Because of Corollary 1, we restrict ourselves to the  structure.If we let the input blocks corresponding to the indices in  take all 2  || possible values, then the output blocks with indices in  will also take all 2 || = 2 || possible values.There is thus a corresponding TU || -decomposition and hence a corresponding vector space in   .This verification is less efficient than the dedicated cryptanalysis methods presented in [MDFK18].However, the aim here is not so much to recover the ASASA structure used, it is rather to identify the S-box as having such a structure in the first place.Using the following corollary, we can immediately understand why   = (︀ 2×2

2
)︀ = 6 for several S-boxes in Table 1: it is a direct consequence of their 2-round SPN structure and of the strong diffusion of their inner linear layer.
Corollary 2. Let  ∈ S 2  be the SAS structure built using  as its linear layer and two layers of -bit S-boxes, where  =  × .If  is MDS over the alphabet of S-box words, then   contains at least (︀ 2
Proof.As  is MDS, all its minors and in particular those corresponding to the definition of -minors have a maximum rank.There are such -minors, to which we add the "free" vector space {(, 0),  ∈ F  2 } which is always present: there are at least vector spaces in   .

Feistel Networks
The Feistel structure is a classical block cipher construction which is summarized in Figure 2b.The number of permutations that are affine-equivalent to -round Feistel networks that use permutations as the round functions is at most equal to . Indeed, we can apply /2-bit linear permutations  and  ′ to each branch and, provided that the round functions are modified, we can cancel them out by applying  −1 and ( ′ ) −1 on the output branches.We can also add constants freely to the output of the first ⌈/2⌉ round functions, as explained in [BLP16].

How to recognize them?
There are efficient function-recovery techniques for up to 5-round Feistel networks [BLP16].However, as soon as affine masks are added, the corresponding techniques can no longer be applied.Still, as with the SPN structure, Feistel networks with few rounds exhibit specific vector spaces in their Walsh zeroes as was already observed for 4-round Feistel network in [BPU16].This means that it is possible to detect such structures using the vector spaces in their Walsh zeroes.

Theorem 4 ([BPU16]
).Let  be a 4-round Feistel network such that round functions 2 and 3 are permutations.Then   (||, 0||) = 0 for all ,  in This observation also holds for a 3-round Feistel.In fact, there are more vector spaces in such a structure.Theorem 5. Let  0 ,  1 and  2 be functions of be the 3-round Feistel network using  0 ,  1 and  2 as its round functions.Then the set   contains the following vector spaces of dimension : The proof of this theorem follows from direct applications of results in [CP19] and of these observations: • if the 3-round Feistel network implies a specific vector space, it also implies the one with the coordinates swapped because its inverse is also a 3-round Feistel network, , and The details are provided in Appendix A.

Structural Anomalies
In light of our results, we can quantify the anomaly associated to the presence of various structures.In this case, the mapping  considered maps S 2  to {0, 1}: a permutation has a specific structure or it does not.The anomaly associated to a given structure is then meaning that the set sizes we extracted above allow us to quantify the anomalies associated to the TU  -decomposition, the SPN structure, the Feistel network and the TKlog (see below for the latter).The corresponding anomalies are summarized in Table 2 for different values of .
The existence of a TU-decomposition with  = 1 for  ∈ S 2  is equivalent to the presence of a component with a linear structure [CP19], i.e. to the existence of  ∈ F  2 such that the Boolean function  ↦ →  •  () has a probability 1 differential.Thus, the corresponding row of Table 2 gives the anomaly corresponding to linear structures.
We can also compute the anomaly associated to the TKlog structure [Per19] used in the S-box of Streebog and Kuznyechik [Fed12,Fed15] called  ∈ S 2 8 .A TKlog is a 2-bit permutation parametrized by an affine function  : F  2 → F 2 2 such that () = Λ()⊕(0) for some linear function Λ.This function must be such that Im(Λ) ∪ F 2  spans F 2 2 .The TKlog also depends on a permutation  of S 2  −1 .It is defined as follows where  is a root of a primitive polynomial  of degree 2, so that  2  +1 is a multiplicative generator of F * 2  .The number of TKlog, is then given by where  is Euler's totient function.As for the inverse function, the encoding of the elements of F 2 2 as binary strings can be considered to be part of the outer affine layers.

Vector Spaces Extraction Algorithms
Let  be a set of elements of F  2 .In this section, we describe an algorithm which extracts all the vector spaces of dimension at least  that are completely contained in .As established in the previous section, the ability to solve this problem will allow us to identify TU-decompositions, some SPNs, and 3,4-round Feistel networks even in the presence of affine encodings.It can also test the CCZ-equivalence [CCZ98] of a function to a permutation, as was done by Dillon et al. [BDMW10] to find the first APN permutation operating on an even number of bits.
Our results can be interpreted using both the ordering relation over the integers and by reasoning over the respective position of the zeroes of the elements in F  2 .The following lemma links these two views.

Definition 4 (Most Significant Bit). Let 𝑥 ∈ F 𝑛
2 and let us write  = ([0], ..., [ − 1]) where [0] is the least significant bit.We denote MSB() the greatest index  such that where the order relation is obtained by interpreting  and ⊕ as the binary representations of integers.

A Simple Approach and How Ours Improves It
Let us first present a naive approach to solving this problem.At its core, this approach is a tree search that builds the complete vector spaces iteratively.
Starting from a specific element  ∈  and vector space   = {0, }, we loop over all the elements  such that  >  and check whether ( ⊕ ) ∈ , in which case we build  , =   ∪ { ⊕ ,  ∈   }.We then repeat this process by looking for  >  such that ( ⊕ ) ∈  for all  ∈  , .This process can then be iterated until complete bases (, , , ...) of vector spaces are found.Our approach is based on the same principles but it significantly outperforms this naive algorithm by solving its two main shortcomings.
First, the basis of a vector space is not unique.The condition that it be ordered, which is implied by the algorithm sketched above, is not sufficient to ensure uniqueness.This implies that the algorithm will be slowed down by the exploration of the branches that actually correspond to identical spaces, and that a post processing checking for duplicated spaces will be needed.Our algorithm will solve this problem and return exactly one basis for each vector space contained in .These bases are called Gauss-Jordan Bases (GJB) and are introduced in Section 4.2.
Second, at each iteration, we need to consider all  ∈  such that  is strictly larger than the largest vector already in the basis being built.In our approach, we update at each iteration a set that contains all the elements  that could be used to construct a larger basis using a process which we call vector extraction (see Section 4.3).Like in the algorithm above, this set only contains elements that are strictly greater than the previous bases elements.However, it is also strictly larger than all the elements in the vector space spanned by this basis and its size is reduced by at least a factor 2 at each iteration.Using vector extractions, we can also skip the test that ( ⊕ ) ∈  for all  in the current vector space which will increase the speed of our algorithm.
Besides, in each iteration, we use a heuristic method to consider only a subset of this set of  which is based on the number and positions of its zeroes, the Bigger MSB Condition.
In summary, we improve upon the algorithm above in the following ways: • we construct exactly one basis per vector space contained in  (using GJB, see Section 4.2), • we significantly reduce the number of vectors that can be considered in the next iterations (using vector extractions, see Section 4.3), and • we further decrease the number of vectors that need to be explored at a given iteration using a specific filter (using the Bigger MSB condition, see Section 4.4).
Finally, the vector space extraction algorithm itself is presented in Section 4.5.An algorithm extracting affine spaces which uses the former as a subroutine is presented in Appendix D. We provide an implementation along with this submission, it is described in lexicographic order.This implies that   <  +1 for all .Some key properties of GJBs are given by the following lemma.
Lemma 3. GJBs have the following properties.
Point 2. We prove each direction of the equivalence separately.
Point 3. Using the first point of this lemma allows us to proceed via a simple induction over the size of the basis.If the basis is simply { 0 } then the lemma obviously holds.Then, adding an element   to the end of a GJB of size  will add 2  elements  such that MSB() = MSB(  ).
The last point of Lemma 3 allows a significant speed up of the search for such GJBs.To describe it, we introduce the following concept.Definition 6 (MSB spectrum).Let  be a set of elements in Corollary 3 (MSB conditions).If a set  of elements from F  2 contains a vector space of dimension , then there must exist a strictly increasing sequence {  } 0≤≤−1 of length  such that N  () ≥ 2  .

Vector Extractions
We now present a class of functions called extractions which will play a crucial role in our algorithms.We also prove their most crucial properties.
Proof.In order to prove the theorem, we proceed by induction over  using the validity of the theorem over bases of size  as our induction hypothesis.At step , we assume that  0 , ...,   are elements of  and that  +1 ∈ (  ∘ ... ∘  0 )() for all  < . and From the induction hypothesis, we have that { 0 , ...,  −1 } is a GJB.Using the second point of Lemma 3, we have that its extension { 0 , ...,   } is a GJB if and only if   [MSB(  )] = 0 (which is equivalent to   <   ⊕   ) for all 0 ≤  <  and MSB(  ) > MSB( −1 ).
By definition of   , we have that   <   ⊕   for all  such that 0 ≤  < , so { 0 , ...,   } is a GJB if and only if MSB(  ) > MSB( −1 ).We have Evaluating   imposes a priori to look whether  ⊕  belongs in  for all  ∈  such that  <  ⊕ .This verification can be implemented efficiently using a binary search when  is sorted.We can make it even more efficient using the following lemma.

Bigger MSB Condition
The following lemma provides a necessary condition for some  0 ∈  to be the first element of a GJB of size .
Lemma 5 (Bigger MSB condition).If  0 is the first element in a GJB of size  of elements of a set  of elements in F  2 , then  ′ defined as must satisfy the MSB condition of Corollary 3 for dimension  − 1, i.e. there is a strictly increasing sequence {  } of length  − 1 such that This lemma provides an efficient filter to know whether  can be the start of a GJB of size  which depends only on the MSB of , so that it does not need to be evaluated for all  ∈  but only once for each subset of  with a given MSB.

Vector Space Extraction Algorithm
Algorithm 1 GJBExtraction algorithm.
for all  ∈ ℒ ′ do 8: Add the GJB ({} ∪ ) to ℒ return ℒ 13: end function If we let   be the identity then we can directly deduce from Theorem 6 and Corollary 4 that GJBExtraction (as described in Algorithm 1) returns the unique GJBs of each and every vector space of dimension at least equal to  that is included in .
This algorithm can be seen as a tree search.The role of   is then to cut branches as early as possible by allowing us to ignore elements that cannot possibly be the first element of a base of size  by implementing the Bigger MSB Condition of Lemma 5: Note that we only need to try and build such a sequence of increasing   once for each value of MSB() for  ∈ .It is possible to check for the existence of such a sequence in a time proportional to ||.
What is the probability that a random S-box has any structure?
In order to answer this question, we first need to define what we mean by structure in this case.To this end, we will build upon the concept of Kolmogorov complexity of a string to bound the complexity of a function.Like in Section 2, our aim is to measure how far the properties of an S-box are from those that expected of a random S-box.However, we will not rely on statistical arguments but only on the pigeon principle.
We introduce the key concept behind our analysis (the Kolmogorov anomaly of an S-box) in Section 5.1.We then use it in Section 5.2 to establish that the number of S-boxes with as high a structure as  is of negligible size.

The Kolmogorov Anomaly of an S-box
The Kolmogorov complexity of a string is the length of the smallest program generating this string.Since the LUT of an S-box is a string, it would be natural to use its Kolmogorov complexity as an estimation of the complexity of the S-box itself.However, we do not want to capture the complexity of its LUT so much as the complexity of the algorithm used to evaluate the function.Thus, we instead try to estimate the Kolmogorov complexity of the implementation of the function.
To derive information about how structured an S-box is, we need to compare the Kolmogorov complexity of its implementation with the size of S 2  .Indeed, if this length is much smaller than log 2 (|S 2  |) then there are very few permutations with as short an implementation.In other words, it is an anomaly.As discussed in the introduction, Shannon used a similar argument to bound the complexity of the circuits implementing Boolean functions in 1949 [Sha49].
Yet, in order to do this comparison, it is necessary that we obtain a meaningful estimate of the Kolmogorov complexity of the implementation.
The choice of the language used to implement the permutation then plays a crucial role.We could simply define a language with a standard library containing a function that evaluates the permutation and obtain a minimal Kolmogorov complexity for the implementation.Nevertheless, this result would not give us any useful information.To solve this problem, we propose to use two sets of languages: variants of the C language (portable C11 and more relaxed K & R style) and compiled programs.As the code for micro-controllers is expected to have a small size, we chose to use the ARM dialect supported by the Cortex-M4 CPU as we had one at hand to test our implementation.
Definition 8 (Kolmogorov Anomaly of a Permutation for a Language).Let  ∈ S 2  be a permutation such that there exists a program of bitlength ℓ L (L) in a given language L returning  () when input .The Kolmogorov anomaly of  for the language L is The "−1" comes from the fact that there are at most 1 + 2 + 2 2 + ... + 2  = 2 +1 − 1 programs with length at most .The Kolmogorov anomaly is then an anomaly in our sense.

Application to the Russian S-box
Let us estimate the Kolmogorov anomaly for  using first C as the language and then actual machine code.
General Approach.In all cases, our approach is based on the TKlog structure of  which we recalled in Equation (3).However, instead of implementing finite field arithmetic, we build a table s such that s[] =  17() .Furthermore, the discrete logarithm that is implicitly used can be implemented in a very compact way.In the case of , the "logarithm" of 0 is set to 0 and that of 1 to 255; the other values are as expected.The following code snippet evaluates this function on x by setting the value of l accordingly. 1 int l =0 , a =2; while (( x ) && ( l ++ , a != x )) { a =( a << 1) ^( a >> 7)*0 x11d ; } Indeed, if  = 0 then this loop is not entered and l is indeed set to 0. If not, then we multiply a value a by the generator of the multiplicative subfield using its representation as a Galois LFSR until its value if equal to the input x.At each iteration, l is incremented.We further save space by replacing the Boolean value a!=x with aˆx as both are equal to 0 if and only if a==x.We can also write 0x11d in decimal to save more space, i.e. 285.
We then set i=l/17, j=l%17 and need to consider two cases: if j==0 then the output of  is (16 − ), otherwise it is (16 − ) ⊕ s[].The way in which we evaluate  changes depending on the language we target.
A Portable C11 Program.Our shortest implementation in portable C11 follows.In this case, we have implemented  as  ↦ →  ()⊕(0) where  is a linear function implemented using a macro.It contains 227 characters after useless spaces and new lines are removed.If the C standard does not specify a character source encoding, the characters expected to be supported roughly correspond to the printable ASCII characters.Thus, we counted that 7 bits are required to encode each character.Hence, the total length of this program is ℓ C11 strict () = 1589 and A K C11 strict () = 94.
A (Marginally) Less Portable C Program.We can make a much shorter program if we use some features which are not part of the C standard but which are used by most compilers in practice.
We consider that the character set used is the ASCII because it allows us to use strings for our char array.Moreover, we assume that a char is only 8-bit long, which implies that a signed int is big enough to contain the range of an unsigned char.It allows us to use int as argument and return type.
In order to implement , we precompute  ↦ → (16 − ) and store it in a table k.In order to initialize both k and s, we use character strings as they provide are a very compact encoding as long as the values inside are printable ASCII characters other than "\".The non-printable character 0xNN is represented as "\xNN".
Hence, our approach is to find two constants to xor to the elements of k and s such that the amount of printable characters is maximal.As the amplitude of k is small, all of them can be printable if we choose well.This is not the case for s, the best we can achieve is 8 printable characters amongst the 15.Still, even with these two constraints, many values for the constants remain possible.
We then add the additional constraint that the xor of the two constant shall be 237 = 252 ⊕ 17.We can save space by setting a variable to 17. Finally, we avoid having a character that corresponds to a hexadecimal number following a non-printable character so as to avoid parsing problems.In the end, we choose constants 188 for k and 173 for s, leading to the strings @'rFTDVbpPBvdtfR@ and \xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8.We store both in a unique array t and obtain the following implementation.After removing the newlines we obtain 173 characters.Each is an ASCII character so that they can all be encoded on a 7-bit word.We deduce that ℓ C+ASCII () = 1211 and A K C+ASCII () = 472.Type declarations used to be optional in C89, and defaulted to int.This is no longer in the standard, but many compilers (in particular, gcc and clang) still support it by default.
Hence, we can further remove the two int in the prototype if we do not seek compliance with the C11 standard.This would further save 8 characters, bringing the total down to 165, i.e. an anomaly of 528.
Binary Code.The previous programs were small in number of characters.However, to reflect the inner complexity (or lack thereof) of the executed program, this approach may not be optimal.In particular, data encoding is denser in binary, type annotations are not in the compiled code and some short successions of C instructions may compile to a lengthy assembly.
We then searched for the shortest program we could achieve.We built upon the assembly of a variant of the following C program.We then decided to hand-optimize the Cortex-M4 program.The Intel instruction set is much more complicated, hence we preferred to focus on this simpler language.Our approach is detailed in Appendix F.1 and the assembly code itself is given in Appendix F.2.The shortest program we obtained for the Cortex-M4 is 80 bytes long.

Conclusion
Let us apply our results to .Although its designers claim to have obtained it by generating permutations uniformly at random and then filtering those according to their cryptographic properties, we found that it has very high anomalies which we summarize in Table 3. structure that is as simple as that of  is negligible.Consequently, the claim of [Per19] that the structure of  was deliberately inserted by its designers is correct.On the other hand, the fact that the designers of  doubled down on their claims of randomness [YH19] instead of acknowledging their use of a structure in light of [Per19] is sufficient for us to urge practitioners not to use Streebog or Kuznyechik.
We finally list some open problems that we have identified while working on this paper.
Open Problem 1.How can we better estimate the differential anomaly?
Open Problem 2. Why are there so many vector spaces in   when  is a 3-round Feistel network of S 2 8 ?

A Proof of the Vector Spaces for Feistel Networks
Our reasoning will rely on two results from [CP19].First, if  is a function mapping F  2 to itself then {(, 0),  ∈ F  2 } has to be in its Walsh zeroes.Second, (Lemma 2 of [CP19]) if the functions  and  are such that Proof of Theorem 5.If  ∈ S 2  is a Feistel network then it is a well-defined permutation, meaning that the spaces in the first category must be in   .More generally, as the inverse of 3-round Feistel network has the same type of structure, if said structure imposes a space in   then it also imposes the symmetric space where the coordinates are inverted.
The well-known integral distinguisher against 3-round Feistel networks implies the presence of a TU /2 -decomposition composed with a branch swap in its input.Thus, the space of the second category has to be in   .
The first space is in the third category in   because  +  is a permutation when  is the projection (, ) = (, 0) (see Figure 3).Indeed, this space is )︂ where we used the fact that   = .The second space in this category is its symmetric.Finally, let  2 ∈ S 2 /2 and let  ′ be the projection such that  ′ (, ) = (0, ).We have that the right hand side of  (, ) is equal to  0 () +  +  2 ( +  1 ( +  0 ())), so that the right hand side of  +  ′ is a permutation of  for any  because ( + )︂ for all , , where we used the fact that ( ′ )  =  ′ .We deduce that the first space in the fourth category is in   .The second one is its symmetric.

C GJB Search Implementation and TU-decomposition
Our implementations of GJBExtraction and CanonicalExtraction are available online at https://who.rocq.inria.fr/Leo.Perrin/code/tu_code.zipThey are written in C++ and use the standard library (std::thread) to handle multithreading.We also provide a multi-threaded function returning the Walsh zeroes.Python bindings allow the use of these algorithms from higher level SAGE6 [Dev17] scripts.
Since the core of our library is written in C++, it is necessary to compile it.To this end, we used cmake7 to set up the compilation-it is therefore necessary to install this tool.We also use the Boost.Python8 library to handle the interaction between C++ and Python.On Ubuntu9 , these correspond to the packages cmake and libboost-python-dev respectively.In order to compile it, use the following commands once cmake and Boost.Python have been installed.cd <directory containing the unzipped supplementary material> cd sboxU cmake .make cd .. At that point, you are ready to run the SAGE script tu_decomposition.sagewhich automatically returns the TU  -decomposition of the Russian .It recovers and prints the 8-bit binary permutations  and  as well as   ∈ S 2 4 and   ∈ S 2 4 that correspond to the TU 4 -decomposition of this component.It then recomputes the lookup table of  using these subfunctions and checks whether it is identical to the original .
This program also shows that  has only one TU-decomposition, namely the one found by Biryukov et al.

D Looking for Affine Spaces
Using GJBExtraction, we can build a similar algorithm returning all the affine spaces in a set of elements of F  2 .However, for an affine space, the GJB of the underlying vector space is not sufficient to uniquely define it.We also need to describe the offset in such a way that it is uniquely defined.The simplest approach was presented in [CDDL06].
Definition 9 (Canonical representation of an affine space).An affine subspace of dimension  can be represented as  ⊕ ⟨ 0 , ...,  −1 ⟩ where { 0 , ...,  −1 } is the GJB of its span and where  verifies  <  ⊕   for all  ∈ {0, ...,  − 1}.It is its canonical representation and it is unique.We thus build an algorithm which looks for the canonical representation of each and every affine space contained in a set  of elements in F  2 .It uses the following operation whose goal is explained by the next lemma.
Definition 10 (Affine Preprocessing).We call affine preprocessing for  ∈ F  2 the function   mapping sets of elements of F  2 to other such sets which is such that  ∈   () if and only if  ⊕  ∈  and  <  ⊕  (i.e.[MSB()] = 0).Lemma 6.Let  be a set of elements of F  2 .Then { 0 , ...,  −1 } is the GJB of a space in   () if and only if  ⊕ { 0 , ...,  −1 } is the canonical representation of an affine space contained in .
Proof.This lemma is an equivalence so we prove each of its directions separately.
⇒ Suppose that { 0 , ...,  −1 } is the GJB of a vector space  ⊆   ().As this imposes that  ⊕  ∈  for all  ∈  , we deduce that  ⊕  ⊆ .Furthermore, as  < ( ⊕ ) for all  ∈  , it holds in particular for all   .Thus,  ⊕ ⟨ 0 , ...,  −1 ⟩ satisfies all the conditions to be a canonical representation and is such that the corresponding affine space is contained in .
Using Lemma 6, we easily derive that Algorithm 2 returns the unique canonical representation of each and every affine space of dimension at least  contained in a set  of elements of F  2 .We use hw () to denote the Hamming weight of .
Composing its components with linear permutations.

Figure 3 :
Figure 3: A permutation  obtained by adding a linear feedforward to a 3-round Feistel network.

Table 2 :
Upper bounds on the anomalies of the affine-equivalence to some structures.For the TKlog, "AE" corresponds to permutations affine-equivalent to some TKlog and "pure" to TKLog themselves./ is the number of S-boxes used in each round, i.e. the number that are applied in parallel.

Table 3 :
Some of the anomalies of .
The code for Cortex-M4: