Improve Neural Distinguishers of SIMON and SPECK

Deep learning has played an important role in many fields, which shows significant potential for cryptanalysis. Although these existing works opened a new direction of machine learning aided cryptanalysis, there is still a research gap that researchers are eager to fill. How to further improve neural distinguishers? In this paper, we propose a new algorithm and model to improve neural distinguishers in terms of accuracy and the number of rounds. First, we design an algorithm based on SAT to improve neural distinguishers. With the help of SAT/SMTsolver, we obtain new effective neural distinguishers of SIMON using the input differences of high-probability differential characteristics. Second, we propose a new neural distinguisher model using multiple output differences. Inspired by the existing works and data augmentation in deep learning, we use the output differences to exploit more derived features and train neural distinguishers, by splicing output differences into a matrix as a sample. Based on the new model, we construct neural distinguishers of SIMON and SPECK with round and accuracy promotion. Utilizing our neural distinguishers, we can distinguish reduced-round SIMON or SPECK from pseudorandom permutation better.


Introduction
Deep learning has brought about significant improvement in many fields [1][2][3], and it enlightened cryptanalysis. As early as 1991, Ronald Rivest [4] discussed the similarities and differences between machine learning and cryptography and analysed the application of machine learning in the field of cryptography. In recent years, deep learning has also been applied to side channel analysis [5,6], and it was pointed out that the sensitive information on embedded devices can be effectively extracted by training neural networks.
At Crypto 2019, Gohr [7] showed that deep learning can produce very powerful cryptographic distinguishers and indicated that the neural distinguisher was better than the distinguisher obtained by traditional approach. He used an input difference to train neural distinguishers of SPECK32/ 64 [8] based on the deep residual neural networks (ResNets) [9]. If the accuracy of a neural distinguisher exceeds 0.5, the neural distinguisher can distinguish target cipher E from pseudorandom permutation. Gohr's work is a giant leap in differential cryptanalysis based on deep learning. However, his work actually opened many questions.
Why are neural distinguishers effective? How to improve neural distinguishers in terms of accuracy and the number of rounds?
In Eurocrypt 2021, Benamira et al. [10] proposed a detailed analysis and thorough explanations of the inherent workings of Gohr's distinguishers. ey showed that Gohr's neural distinguisher was in fact inherently building a very good approximation of the differential distribution table (DDT). Based on this, Benamira et al. also constructed an 8-round distinguisher of SIMON32/64. In [10], Benamira et al. answered the first question. Similarly, Chen and Yu [11] bridged machine learning and cryptanalysis via the extended differential-linear connectivity table.
e first question is answered in [10,11]. In addition to these works related to the inherent workings of neural distinguishers, there are some works related to the improvement of the neural distinguishers. In [12], Chen and Yu designed a new neural distinguisher model using multiple ciphertext pairs instead of single ciphertext pair. e new neural distinguisher can be used to improve the key recovery attack on 11-round SPECK32/64. But Chen et al. did not explore improving the accuracy from the perspective of input difference or output difference, which is not conducive to finding a longer-round neural distinguisher. In [13], Su et al. constructed polytopic neural distinguisher of round-reduced SIMON32/64. eir work partially answered the second question, yet the second question is still worth studying, especially in selecting the input differences and data format. Not limited to the neural distinguishers, there are also some works related to the neural aided key recovery attack [14][15][16].
It is not difficult to find that further improvement of the neural distinguishers is still worth studying, especially in accuracy and the number of rounds, because if the distinguishing accuracy is promoted, the complexity of key search can be reduced; and if the number of rounds is increased, the key recovery attack can be improved. However, unfortunately, there are few works to explore how to improve neural distinguishers from the perspective of the input difference. Besides, the neural distinguishers can be improved by using other distinguisher models. Inspired by these existing works, our core target is to answer the second question, that is, to further improve neural distinguishers in terms of accuracy and the number of rounds.
In this paper, our contributions are as follows.
An algorithm is designed based on SAT to improve neural distinguishers and apply to SIMON. In [7], Gohr chose (0x40, 0x0) as the input difference to train his distinguisher because it transitioned deterministically to the low-weight output difference. But such input differences are hard to find, which makes it difficult to find effective distinguishers. To solve this problem, we propose an algorithm based on SAT to improve neural distinguishers. With the help of this automatic search tool, we search for the exact nr-round differential characteristics with probability [2 − (n/4) × P max , P max ] and choose their input differences to train nr-round neural distinguishers, where P max is the optimal probability and n is the block size. Utilizing the algorithm, we obtain some neural distinguishers of 9-round SIMON32/64, 10-round SIMON48/96, and 11-round SI-MON64/128 with the accuracy exceeding 57% for the first time. Compared with the choice of input difference presented in [10], our algorithm obtains higher-accuracy neural distinguishers. Our results are shown in Table 1.
A new neural distinguisher model is proposed using multiple output differences and neural distinguishers of SIMON and SPECK are improved. In image recognition based on deep learning, a deep learning researcher will enhance some objective features of pictures so that the neural network can learn more effective features, which will improve the accuracy of the network. In [10], Benamira et al. explored the connection between Gohr's distinguisher and DDT, which enlightens us that the output difference is helpful to improve neural distinguishers. is also implies that we can selectively enhance certain features from output difference to improve neural distinguishers. Unlike [7,12] using ciphertext pairs as training data, we use the output differences to train neural distinguishers by splicing output differences into a matrix as a sample. For a matrix, we treat it as an image and each output difference of the matrix is treated as an objective feature. Our goal is not only to learn each objective feature but also to learn the connections between output differences. If all output differences of the matrix are from the same input difference, the matrix will be labeled 1; otherwise, it will be labeled 0. anks to the new model learning more features than using ciphertext pairs, we improve neural distinguishers of SIMON32/64, SIMON48/96, and SIMON64/128. Besides, we obtain new neural distinguishers of 8-round SPECK32/64, 7-round SPECK48/96, and 8-round SPECK64/128, which are better than the existing neural distinguishers. Using our improved neural distinguishers, we can distinguish reduced-round SIMON or SPECK from pseudorandom permutation better. As a footnote, we show with experiments where the improvement in the accuracy of distinguishers is not due to the increase in the number of plaintexts but learning more features from the relationship between the output differences. e summary of our neural distinguishers together with other neural distinguishers is shown in Table 1. e remainder of this paper is organised as follows. In Section 2, we introduce the basic notations and review Gohr's distinguishers. In Section 3, we design an algorithm based on SAT to help us find high-accuracy neural distinguishers. In Section 4, we propose a new neural distinguisher model to ulteriorly improve neural distinguishers. Conclusions are drawn in Section 5 where we also suggest further work.

Preliminaries
To make it easier to read this paper, we first list the main notations. en an overview of Gohr's work is given.

Overview of Gohr's Distinguisher Mode.
Given a fixed input difference △ � (0x40, 0x0) and a plaintext pair (P 0 , P 1 ), the resulting ciphertext pair (C 0 , C 1 ) is regarded as s sample. Each sample will be attached a label Y: A neural network is trained over enough samples labeled 1 and 0. In addition, half the training data comes from ciphertext pairs labeled 1 and the other half from ciphertext pairs labeled 0. For the samples with label 1, their ciphertext pairs are from a specific distribution related to the fixed input difference. For the samples with label 0, their SIMON 2n/mn SIMON acting on 2n-bit plaintext blocks and using a mn-bit key SPECK 2n/mn SPECK acting on 2n-bit plaintext blocks and using a mn-bit key ⊕ Bitwise XOR ⊙ Bitwise AND ∨ Bitwise OR + Addition modulo 2 n S j Left circular shift by j bits K Master key ciphertext pairs are from a uniform distribution due to their random input difference. If a neural network can obtain a stable distinguishing accuracy higher than 50\% on a testing set, we call the trained neural network a neural distinguisher. What is particularly noteworthy is that each sample is encrypted by a random key. By this method, the neural distinguisher will work whether the key is changed or not. In [7], Gohr chose the deep residual neural networks to train neural distinguisher and obtained effective neural distinguishers of 5-round, 6-round, and 7-round SPECK32/64. In traditional differential attack, it is pivotal to distinguish encryption function from a pseudorandom permutation, which is done with the help of the differential characteristic. For an nr-round optimal characteristic Δα ⟶ 2 − t Δβ of a block cipher with block size n bits, we calculate the output difference given the fixed input difference Δα. If the ratio of the output difference to Δβ is about 2 − t , then we can distinguish the block cipher from a pseudorandom permutation.
is is called distinguishing attack for block ciphers.
For Gohr's neural distinguisher, we can obtain N ciphertext pairs encrypted by the input difference (0x40, 0x0). We input the N ciphertext pairs, and the neural distinguisher will predict their labels. If the ratio of samples labeled 1 exceeds 0.5, we can distinguish the block cipher and pseudorandom permutation and the neural distinguisher is effective. is is called a distinguishing attack based on the neural distinguisher. In addition, it is obvious that the higher the accuracy of the neural distinguisher, the better the effect of the distinguishing attack; and the complexity of key search can also be reduced if the distinguishing accuracy is greatly promoted. So, it is necessary to improve neural distinguisher.
In [7], Gohr explained the reason for choosing (0x40, 0x0) as the input difference that it transitioned deterministically to the low-weight difference (0x8000, 0x8000). But it is pretty hard to find such input differences unless the full differential distribution table is used. Moreover, it is a time-consuming task to calculate the full DDT, especially for large-size block ciphers.

An Approach Based on SAT to Improve Neural Distinguisher
In traditional differential cryptanalysis, it is a primary task to find a high-probability differential characteristic, which takes advantage of the unevenness of the differential distribution. e distribution of output differences is different for different input differences. For a neural distinguisher, it actually learns the distribution of output difference given a fixed input difference. erefore, the input difference directly affects the accuracy of the neural distinguisher.
In [7], Gohr chose (0x40, 0x0) as the input difference to train the distinguisher because it transitioned deterministically to a low-weight output difference. But such input differences are hard to find, which makes it difficult to find effective distinguishers. In [10], Benamira et al. chose the input difference from nr− 1-round or nr− 2-round optimal differential characteristics for nr-round neural distinguishers.  [17]. 3 We choose the highest-accuracy neural distinguisher in [12,13]. 4 Chen et al. used 5round neural distinguisher to attack SPECK48/x, but the accuracy was not presented in [15].
In this section, we will introduce our algorithm for improving nr-round neural distinguishers by searching for the nr-round differential characteristics. With the help of SAT/SMT solver, we search for high-probability differential characteristics with probability in [ where P max is the optimal probability and b s is the block size. Using our algorithm, we can obtain high-accuracy neural distinguishers for 9-round SIMON32/64, 10-round SIMON48/96, and 11-round SIMON64/128.

Generic Network Architecture.
Gohr converted the distinguisher of ciphertext pairs into a binary classification problem. His method is not only applicable to SPECK but also applicable to SIMON. With his method, we can construct a generic network architecture for other ciphers. We refer to [7] for the description of the method of constructing the network architecture.
ere are multiple neural networks available to train neural distinguishers, such as MIP and ResNets. We choose the ResNets to train a neural distinguisher.
Our networks comprise three main components: input layer, iteration layer, and predict layer, shown in Figure 1. n in Figure 1 refers to the word size of SIMON 2n/mn. e input layer receives training data with fixed length. In the iteration layer, we use 5 residual blocks. In each residual block, we use two Conv1D layers, and each Conv1D layer is followed by a batch normalization layer and an activation layer. After flattening data from iteration layer, data will be sent into a fully connected layer. e fully connected layer consists of a hidden layer and an output unit.
In our network, we choose the kernel size of the first Conv1D layer as 1 and the kernel size of the other Conv1D layer is 3. In addition, the number of filters in each convolutional layer is 2n and the padding method is SAME. At last, we train our network based on L2 weights regularization to avoid overfitting. e other details of the hyperparameters used are given in Table 2. In Table 2, we choose hyperparameters similar to those in Gohr's choice, so we can ignore the influence of the neural network and its parameters. After the neural distinguisher is trained, we can use it to distinguish the output of target cipher with a given input difference from random data. e higher its accuracy on the test set is, the better it distinguishes ciphertext data.

An Algorithm Based on SAT to Improve Neural
Distinguisher. SAT is the Boolean Satisfiability Problem. It is an NP-complete problem and considers whether there is a valid assignment to Boolean variables satisfying a given set of Boolean conditions. As the key issue of computer science and artificial intelligence, SAT solvers have gained a lot of attention since it was proposed. It has great advantages of open source, good interface, high efficiency, and perfect compatibility. ere are many cryptanalysis results based on SAT [18][19][20].
At present, there are two main ways to select the input differences of neural distinguishers. One way is to directly choose an existing optimal differential characteristic [12], and the other is to choose nr− 1-round or nr− 2-round optimal differential characteristics for nr-round neural distinguishers [10]. But these methods cannot effectively promote the distinguishing accuracy.
Taking into account the unevenness of the distribution of output differences for different input differences, we decide to choose the input differences of high-probability differential characteristics as the candidate differences. We search for high-probability differential characteristic by a SAT-based automatic search tool and train neural distinguishers with these input differences of differential characteristics. Based on this, we design an algorithm to help us search for neural distinguishers with higher accuracy, which is shown in Algorithm 1. In Algorithm 1, we expand the search space of input difference by expanding the range of the probability. We choose [2 − (b s /4) × P max , P max ] as the lower bound of the probability, where P max is the probability of the optimal differential characteristics and b s refers to the block size of the target cipher. By experimental experience, we find that if the differential probability is lower than 2 − (b s /4) × P max , there are almost no high-accuracy neural distinguishers. So there is nearly no need to spend time on the differential characteristics with the probability lower than 2 − (b s /4) × P max .
Using eorem 3 in [19] and open-source SAT/SMT solver Z3 [21], we search for high-probability differential characteristics of SIMON. en, with Algorithm 1, we get the 9-, 10-, and 11-round neural distinguishers of SIMON32/64, SIMON48/96, and SIMON64/128, respectively. is is the first time that there is a neural distinguisher of 11-round SIMON64/128. e results of the neural distinguishers are shown in Table 3.
In order to show that Algorithm 1 is effective, we use the other two methods [10,12] of selecting the input difference to train the neural distinguishers with the same rounds. For the method presented in [12], we choose (0x10100, 0x44040) in [17] as input difference to train 9-round neural distinguisher. Besides, for the other method presented in [10], we train 10round neural distinguishers of SIMON48/96 using 9-round and 8-round optimal differential characteristics, and the specific results are shown in Tables 4 and 5.
In Table 3, we choose the same data format as that of Gohr's distinguisher, which is single ciphertext pair. Other hyperparameters are posted in Table 2. As shown in Table 3, we show the comparison of the accuracy from three methods of selecting the input difference. As we can see, compared with selecting the input difference in [10,17], the accuracy of neural distinguishers obtained by Algorithm 1 has been significantly promoted, which can be used to reduce the complexity of key recovery attack. Although both methods select the input difference from differential characteristics, Algorithm 1 selects the exact rounds of the differential characteristics according to the rounds of neural distinguisher.
We also try to search for neural distinguishers for more rounds. Unfortunately, as the number of rounds increases, the nonrandom features of the ciphertext pairs become weaker and weaker. So it is difficult for us to find a neural distinguisher with longer round, even if using Algorithm 1. In addition, the higher the Hamming weight of the input difference, the weaker the nonrandom feature of the ciphertext pair. So we should firstly search for input differences with lower Hamming weight adopting Algorithm 1 if time is limited.

A New Neural Distinguisher Model Using Multiple Output Differences
In [   (1) Search for the optimal probability as P max (2) Search for the differential characteristics with probability in [2 − (b s /4) × P max , P max ], and save their input differences as DIFF  In [17], Abed et al. constructed differential characteristics of SIMON. We train neural distinguishers by choosing the input differences in [17]. 2 We train neural distinguishers using Benamira et al.'s method presented in [10]. Table 4: 10-round neural distinguishers using 9-round optimal characteristics.

New Neural Distinguisher Model.
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. As we know, the deep learning is data-driven, and the quality of the data determines the quality of the model to some extent. For neural distinguishers, the choice of ciphertext pairs directly affects the accuracy of the neural distinguishers, which has been solved in Section 3. In deep learning field, the format of training data also affects the quality of the trained model to some extent. is enlightens us that we can improve neural distinguishers from the perspective of data format. In image recognition, the deep learning researchers currently rotate the image or crop it to enhance some objective features, which has been experimentally proven to be effective. Inspired by Benamira et al.'s work and data augmentation in deep learning, we use the output differences to train neural distinguishers by splicing output differences into a matrix as a sample. For a matrix, we treat it as an image and each output difference of the matrix is treated as an objective feature. Our goal is not only to learn each objective feature but also to learn the connections between output differences. As shown in Figure 2, the k plaintext pairs ((P 1 1 , P 1 2 ), (P 2 1 , P 2 2 ), . . . , (P k 1 , P k 2 )) are encrypted by a random master key.
e k ciphertext pairs ((C 1 1 , C 1 2 ), (C 2 1 , C 2 2 ), . . . , (C k 1 , C k 2 )) are converted to output differences, where n is the block size of ciphers. We splice multiple output differences into a matrix as a sample, which is described as O 1 ‖O 2 ‖. . . . Similar to Gohr's method, given an input difference I d , each sample will be attached a label Y according to the following equation: If the label is 1, the matrix is denoted as a positive sample. Otherwise, it is denoted as a negative sample. We call the new data format DF different . By randomly generating plaintext and key, we make our distinguishers learn the features of target block cipher instead of the features of the plaintext or key. In the experiment, we make the neural network learn more features by using more output differences in a matrix. As we can see, the new data format needs more ciphertext pairs. For the same number of training sets, the new model requires k times more data than Gohr's model.
Because only the channel dimension is changed, we refer to Figure 1 for the description of network architecture.

Application to SIMON.
We choose the input difference in Table 3 to train new neural distinguishers. Other hyperparameters are posted in Table 2. e accuracy comparison is presented in Table 6. For 11-round SI-MON48/96, we do not obtain an effective neural distinguisher using the input difference in Table 3. So we research other high-probability differential transmissions.
In Table 6, the "SCP" refers to the data format of Gohr's neural distinguisher, and the "MOD" refers to the data format shown in Figure 2. As shown in Table 6, compared with using ciphertext pairs, the number of rounds and accuracy of new neural distinguishers are greatly promoted. In addition, the new distinguishers can be further promoted by increasing k, which shows that the superposition of output difference can help the neural network to learn more unknown features.

Application to SPECK.
e new format is not limited to the neural distinguisher of SIMON, as it can also be found to be effective in SPECK. In [7,12], (0x40, 0x0) is used to train neural distinguisher of 7-round SPECK32/64. Using the  Security and Communication Networks difference, we obtain a new higher-accuracy neural distinguisher of 7-round SPECK32/64. Not only that, with the help of [18,20], we obtain a good input difference (0x2800, 0x10) and an effective 8-round neural distinguisher. As far as we know, this is the first effective 8-round neural distinguisher of SPECK32/64 with accuracy more than 55\%. Besides, we also obtain neural distinguishers of 7-round SPECK48/96 and 8-round SPECK64/128. Summary of the existing results is shown in Table 7. In Table 7, "SCP" refers to the data format of Gohr's neural distinguisher, "MCP" refers to the data format using multiple ciphertext pairs, and "MOD" refers to the data format shown in Figure 2. Other hyperparameters are posted in Table 2. Utilizing the new model, we improve neural distinguishers in terms of length and accuracy. We can achieve better results in distinguishing attack utilizing the new neural distinguishers. Moreover, we give a further illustration of our model. Since we use more data in the new model than using ciphertext pairs, this makes our improved results seem to be related to increase of data. We perform supplementary experiments to show that the improvement of the accuracy of distinguishers is not due to the increase of the number of plaintexts but because of learning more features from the relationship between the output differences.

A Supplementary Explanation to Our New Model.
Although the accuracy is higher using the new data format, the performance may be likely improved by training on more training samples. So we use the same number of ciphertext pairs to train neural distinguishers shown in Table 8. Other hyperparameters are posted in Table 2.
In Table 8, "SCP" refers to the data format of Gohr's neural distinguisher, and "MOD" refers to the data format shown in Figure 2. As shown in Table 8, the accuracy using multiple output differences is higher, even if the same amount of data is used. In addition, it takes up less memory using output differences, which can reduce training time in the training process.
To further illustrate the effectiveness of the new distinguishers, we conduct additional experiments. As shown in Figure 3, we use k same output differences as a sample; And we call the data format DF same . As shown in Figures 2  and 3, DF different uses k different output difference in a sample, while DF same uses k same output difference in a sample. Based on the data format DF same , 10 6 positive and negative ciphertext pairs are randomly generated; and each output difference is reused k times and filled in a matrix as a sample. en new neural distinguishers are performed on 10 6 samples. We calculate the accuracy of the new neural distinguishers for these special data. Table 9 shows the corresponding test results.
In Table 9, the "Accuracy using DF same " refers to the accuracy of neural distinguishers trained by DF same . e "Accuracy using DF different " refers to the accuracy of neural distinguishers trained by DF different . As shown in Table 9, the accuracy using DF same is lower than that using DF different .  SCP: single ciphertext pair; MCP: multiple ciphertext pairs; MOD: multiple output differences. 1 e highest accuracy of 7-round SPECK32/64 in [12]. 2 k � 32. 3 Chen et al. used 5-round neural distinguisher to attack SPECK48/x, but the accuracy was not presented in [15]. 4 k � 48. 5 k � 64.   Figure 3: A data format using k same output differences. For SIMON 2n/mn, the key-dependent SIMON 2n/mn round function is the map R k i GF(2) n × GF(2) n ⟶ GF(2) n × GF(2) n defined by is the round subkey.

SPECK
Similar to SIMON, there are some different variants of SPECK, and these parameters about SPECK are shown in Table 11. For SPECK 2n/mn, the key-dependent SPECK 2n/mn round function is the map R k i : GF(2) n × GF(2) n ⟶ GF(2) n × GF(2) n defined by R k i x i , y i � S − α x i + y i ⊕ k i , S − α x i + y i ⊕ k i ⊕ S β y i , where k i (k i ∈ GF(2) n ) is the round subkey.
As it is out of scope for our purpose, we refer to [8] for the description of the key-scheduling.

Data Availability
e data used to support the findings of this study are included within the article.

Disclosure
A preprint has previously been published in [22].