New Linear Cryptanalysis of Chinese Commercial Block Cipher Standard SM 4

SM4 is a Chinese commercial block cipher standard used for wireless communication in China. In this paper, we use the partial linear approximation table of S-box to search for three rounds of iterative linear approximations of SM4, based on which the linear approximation for 20-round SM4 has been constructed. However, the best previous identified linear approximation only covers 19 rounds. At the same time, a linear approximation for 19-round SM4 is obtained,which is better than the known results. Furthermore, we show the key recovery attack on 24-round SM4 which is the best attack according to the number of rounds.


Introduction
SMS4 [1], issued in 2006 by Chinese government, serves the WAPI (WLAN Authentication and Privacy Infrastructure) as the underling block cipher for the security of wireless LANs.In 2012, SMS4 was announced as the Chinese commercial block cipher standard, renamed SM4 [2].
SM4 receives more attention from the cryptographic community and a lot of cryptanalytic results for SM4 have been produced.In [3], the rectangle and boomerang attacks on 18-round SM4 and the linear and differential attacks on 22-round SM4 have been presented.Using multiple linear attack, Etrog and Robshaw gave an attack on 23-round SM4 in [4].Besides these, the differential attack and the multiple linear attack on 22-round SM4 have been introduced in [5,6].Till now, the best differential attack for 23-round SM4 is given in [7].Cho and Nyberg proposed a multidimensional linear attack on 23-round SM4 in [8].The best linear attack on 23-round SM4 is provided by Liu and Chen in [9].Bai and Wu proposed a new lookup-table-based white-box implementation for SM4 which could protect the large linear encodings from being cancelled out in [10].Moreover, related-key differential attack on SM4 has been given in [11] and the lower bound of the number of linear active S-boxes for SMS4-like ciphers has been analyzed in [12].
Linear cryptanalysis [13] is one of the most important techniques in the analysis of symmetric-key cryptographic primitives.The linear cryptanalysis focuses on the linear approximation between plaintext, ciphertext, and key.If a cipher behaves differently from a random permutation for linear cryptanalysis, this can be used to build a distinguisher or even a key recovery attack through adding some rounds.The subkeys of appended rounds are guessed and the ciphertexts are decrypted and/or plaintexts are encrypted using these subkeys to calculate intermediate state at the ends of distinguisher.If the subkeys are correctly guessed, then the distinguisher should hold.Otherwise, it will fail.Linear cryptanalysis has been used to analyze many ciphers such as [14][15][16][17].
Our Contributions.In terms of the number of rounds that all the previous attacks for SM4 can work, the best key recovery attacks on SM4 are linear cryptanalysis and differential cryptanalysis, and both of them are based on 19-round distinguishers.Whether we can get a better distinguisher is our first motivation to improve the attacks on SM4.Therefore, we focus on searching the linear approximation for SM4 to improve the attacks on SM4.The contributions of this paper are summarized as follows.
The best previous linear attacks work on the 19-round linear approximations.We design a new search algorithm for the 2 Security and Communication Networks iterative linear approximations for small rounds of SM4 by gradually expanding the partial linear approximation table of S-box.Firstly, it is proved that there is no one-round or tworound iterative linear approximation for SM4, and then some properties are obtained for the iterative linear approximations of 3-round SM4.Based on these properties, we utilize our searching algorithm to get an 19-round linear approximation with bias 2 −58 and a 20-round linear approximation with bias 2 −61 .The results about our identified linear approximations with the previous ones are depicted in Table 1.It can be seen that our linear approximations are the best ones so far.The best previous attacks can work on 23-round SM4.Utilizing our identified 20-round linear approximation of SM4, we give a key recovery attack on 24-round SM4, which is the best attack according to the number of rounds for SM4.Moreover, the new 19-round linear approximation is used to attack 23-round SM4.As a result, the best previous linear attack on 23-round SM4 is improved.A summary of our attacks and the previous attacks on SM4 is listed in Table 2.
The paper is organized as follows.Section 2 briefly describes the notations used in this paper and introduces the SM4 block cipher.Section 3 shows how to search the better linear approximations for SM4.In Section 4, we use the 19-round and 20-round linear approximations to attack 23round and 24-round SM4, respectively.Section 5 concludes this paper.as
One round of SM4 is shown in Figure 1.It can be known from Figure 1 that  is composed of the nonlinear layer  and the linear transformation .Layer  has four 8 × 8 Sboxes used in parallel.The specification of the S-box could be referred to [1].Let  ∈ F 32 2 and  ∈ F 32 2 be the 32-bit input and output words of the linear transformation .Then The key schedule of SM4 is similar to the encryption procedure but the only difference between them is that the linear transformation in the key schedule is The 128-bit master key (MK 0 , MK 1 , MK 2 , MK 3 ) is first masked with the constants FK 0 , FK 1 , FK 2 , FK 3 and then input to the key schedule function.

Search for the Linear Approximations of SM4
In terms of the number of rounds, all previous attacks for SM4 can work.One of the best key recovery attacks on SM4 is linear and differential cryptanalysis, and both of them are based on 19-round distinguishers.Whether we can get a better distinguisher is our first motivation to improve the attacks on SM4.Therefore, the key point is to search for the linear approximation of SM4.As far as we know, some methods to search for linear approximations of SM4 have been considered in [3,4,9,19].
The search method in [3] is to construct linear approximations for reduced-round SM4 by identifying a one-round linear approximation with the same input and output masks for the  function.In this way, the number of active  functions can be minimized.As a result, an 18-round linear approximation with bias 2 −57.28 for SM4 has been found.
In [4], Etrog and Robshaw derived a 5-round iterative linear approximation where only the last two rounds are active, and then they concatenated three five-round iterative linear approximations to construct an 18-round linear approximation with bias 2 −56. 2 .
In [19], Liu et al. used the branch-and-bound algorithm in [20] to obtain a series of 5-round iterative linear approximations, which are utilized to construct an 18-round linear approximation with bias 2 −56. 14.
In order to get a better linear approximation for SM4, Liu and Chen gave a more dedicated search algorithm in [9].They firstly used an MILP-based method to search the mode for the linear approximation with the minimum number of active Sboxes for reduced-round SM4; then based on the identified mode they found the 19-round linear approximation with bias 2 −62.27 .
It is obvious that even if the number of active S-boxes for a linear approximation is minimized, the absolute of its bias might not be maximum.From this point, we focus on searching for better linear approximations with a few more active S-boxes.
At CT-RSA 2014, Biryukov and Velichkov extended the branch-and-bound algorithm to search for the differential characteristics of ARX ciphers where the partial differential distribution table for modular addition is used in order to improve the search efficiency [21].Inspired from this idea, we will use the partial linear approximation table to search for linear approximations of SM4.
At first, some properties for basic operations such as the XOR operation, the three-forked branching operation, and the linear map will be introduced.
Biases in the linear approximation table for S-box of SM4 take the values /2 8 (2 ≤  ≤ 16).If we put all the linear approximation table into the search program, the program will be too slow to get a better linear approximation.Thus, the partial linear approximation table is used in the search algorithm.The basic idea is that linear approximations of S-box with higher bias are utilized first.If no better linear approximation is output, then we can expand the partial linear approximation table by appending more linear approximations of S-box with less bias successively till a better linear approximation is output.
In order to get a better linear approximation, one common method is to find iterative linear approximations for short rounds first based on which long rounds of linear approximations could be produced directly.Thus, we will focus on searching for iterative linear approximations of SM4.Now three properties for iterative linear approximations of SM4 are shown as follows.
Property 4.There is no one-round iterative linear approximation with active S-boxes on SM4.Proof.From Figure 2, if there is an iterative linear approximation for the first round, we have Using the property of three-forked branch, we have From ( 6) and ( 7), we get which implies Γ  = 0 and all the S-boxes in this round are passive.Thus, there is no one-round iterative linear approximation for SM4.
Property 5.The iterative linear approximation for two rounds of SM4 does not exist.
Proof.If there is an iterative linear approximation for the first two rounds in Figure 2, then we have With the property of three-forked branch, we have According to ( 9) and ( 10), we derive Thus, which means that Γ 0  = Γ 0 +2 = 0. Substitute the terms Γ  in the above formulas and we have so Γ +1 = 0, which means that all S-boxes in the first two rounds are passive.Therefore, 2-round iterative linear approximation for SM4 does not exist.
Property 6.For the iterative linear approximation of 3round SM4, the minimum number of active S-boxes is 3.Meanwhile, each round has one active S-box and the active S-boxes are located in the same positions of three rounds.Proof.If there is an iterative linear approximation for three rounds in Figure 2, then we have So We focus on the linear approximation with less active Sboxes.From (15), it is impossible for a three-round iterative linear approximation to have only one active S-box.If there are two active S-boxes, then Γ  = 0 or Γ +1 = 0 or Γ +2 = 0. Hence, all S-boxes in the three-round linear approximation are passive.Take Γ  = 0 as an example.
In the cases Γ +1 = 0 and Γ +2 = 0, we can also obtain that there is no active S-box in the three-round linear approximation by the similar way of the case Γ  = 0. Therefore, the iterative linear approximation for three-round SM4 has at least three active S-boxes.From (15), it is clear that each round has one active S-box and these active S-boxes are located in the same positions of three rounds.
From Property 6, we will try to search for the iterative linear approximation of 3-round SM4 where each round has only one active S-box.The search algorithm is listed in Algorithm 1.In Algorithm 1, the following notations are used.
are input and output masks of -layer in the th round.Γ , and Λ , are input and output mask of the th S-box of the th round.  is a partial linear approximation table of S-box which consists of linear approximations with bias no less than /2 8 (2 ≤  ≤ 16).
After proceeding the search algorithm, we identify 12240 3-round iterative linear approximations with bias 2 −10 .With any 3-round iterative linear approximation, we can construct linear approximations for 19-round and 20-round SM4 with bias 2 −58 and 2 −61 , respectively.Compared with the best previous 19-round linear approximation in [9], the bias has been improved from 2 −62.27 to 2 −58 .In Tables 3 and 4, we give linear approximations for 19-round and 20-round SM4, respectively, where all masks are denoted as hexadecimal values and " * " is undecided.

Key Recovery Attacks for SM4
4.1.Linear Attack on 24-Round SM4.We append two rounds to the bottom and the top of the 20-round linear (1) for  ← 16 to 2 do (2) for  ← 0 to 3 do find all  0 input masks indexed by Λ 0, from   , and store in   [ 0 ] ( 6) find all  1 input masks indexed by Λ 1, from   , and store in   [ 1 ] ( 9) for find all  2 input masks indexed by Λ 2, from   , and store in end for (17) for  ← 0 to 2 do (18) end for (20 approximation in Table 4, respectively.Then a linear attack on 24-round SM4 is presented.The partial sum technique [24] is used in the partial encryption and decryption procedures.See Figure 3.
(3) For every plaintext/ciphertext pair, calculate Then increase the counter V 0 [] by one.

4.2.
Linear Attack on 23-Round SM4.Two rounds are added to the bottom and the top of the 19-round linear approximation in Table 3, respectively.The key recovery attack on 23-round SM4 is similar to the attack procedure of 24-round SM4, so we omit details of the process.
If we set the data complexity  = 2 120.3 and the advantage  to be 47, the time complexity is 2 120.3 + 2 121 = 2 121. 723round encryptions, and the memory complexity is 2 85 bytes.The success rate   = 85.9% is computed with the method in [25].

Conclusions
In this paper, it is firstly shown that there is no one-round or two-round iterative linear approximation for SM4 and the property for the 3-round iterative linear approximation.On the basis of the property, we search for the iterative linear approximation of 3-round SM4 by the partial linear approximation table.Next the 20-round linear approximation is constructed by 3-round iterative linear approximations.The best previous distinguishers only cover 19 rounds.Then the key recovery attack on 24-round SM4 is provided, which is the best known attack on SM4 so far.Moreover, we also get a better 19-round linear approximation, used to improve the linear attack on 23-round SM4.As for future work, we hope to use the similar technique to search for a better differential characteristic for SM4.

Table 1 :
Summary of linear approximations of SM4.

Table 2 :
Summary of attacks on SM4.