Improving the Performance of CPA Attacks for Ciphers Using Parallel Implementation of S-Boxes

,


Introduction
A cryptographic device running will produce some physical characteristics, such as execution time [1], power consumption [2], electromagnetic radiation [3], and so on.Essentially, the size of these physical characteristics are connected to a particular secret key used in the cryptographic device, thus in many cases enabling its efficient recovery.Therefore, an attacker can employ relevant statistical models to analyze these characteristics generated by the cryptographic devices and eventually to recover the secret key, which is in general referred to as the side-channel attack (SCA).Depending on whether there is a profiling phase in the attack, SCA methods can be classified into two classes, namely nonprofiling and profiling.
(1) Nonprofiling SCA Methods.During the attack, the adversary collects physical characteristics generated by the targeted cryptographic device that uses a fixed secret key k * .The attacker then uses statistical analysis, serving as a distinguisher, to calculate the correlation between the collected physical traces for the purpose of identifying the correct secret key k * among a (sub) set of suitable key candidates fkg.Depending on the distinguisher, the nonprofiling SCA methods include simple power analysis (SPA) [1], differential power analysis (DPA) [2], correlation power analysis (CPA) [4], and mutual information analysis (MIA) [5] among others.(2) Profiling SCA Methods.In this case, the attacker has access to a pair of identical devices, which are commonly called the profiling and targeted devices.The attack is performed in two steps, known as the profiling and attack phases.During the first phase, the attacker uses the profiling device to model physical leakage of the targeted device for all possible guessed keys k.In the second phase, the attacker collects the physical characteristics of the targeted device and retrieves the correct secret key k * through the specific leakage model.This kind of attack includes Templates Attacks [6], Stochastic Attacks [7], and Machine Learning-based Attacks [8] among others.Due to the profiling phase, this kind of attack generally performs better than the nonprofiling SCAs.
Since their introduction in the early 2000, a lot of efforts have been put in improving the performance of SCA methods.When preprocessing of the physical traces is of concern, different approaches such as linear discriminant analysis (LDA) [9], elastic alignment [10], principal component analysis (PCA) [11], singular spectrum analysis (SSA) [12], and lowpass filtering [13] are often used to improve the signal-tonoise ratio (SNR).For nonprofiling SCA attacks, the cryptography community has studied the effect of the different distinguishers [14][15][16] on their success rate.In addition, an effective enumeration of the guessed keys [17][18][19] can be useful in improving the performance of these nonprofiling attacks and rank evaluation methods [17,20,21] can be used to enumerate candidate keys in decreasing order of likelihood or estimate rank of the correct key.Moreover, the application of artificial intelligence and machine learning in the nonprofiling SCA attacks seems to be under rapid development recently [22].For profiling SCA attacks, the quality of the profiling phase is one of the key factors that affects its performance.Due to the possibility of accurately extracting important properties of the available data, the methods such as deep learning [8] (e.g., deep neural networks (DNN) [23]), convolutional neural networks (CNN), and multilayer perception networks [24] are often used in the profiling phase.We also mention the efforts of the cryptographic community toward various improvements when a recovery of the secret key of cryptographic devices with a protective scheme is considered, refer to [25].
In this article, we propose certain improvements concerning the family of nonprofiling SCA attacks, particularly CPA, applied to unprotected cryptographic implementations.A more detailed overview of these improvements is given in the next subsection.
1.1.Related Work.In CHES 2004, Brier et al. [4] proposed the use of Pearson correlation coefficients for calculating the correlation between power traces of the cryptographic algorithm and the data processed by the cryptographic device.These coefficients are then efficiently used to recover (a portion of) the secret key k * and the attack is called CPA [4].These original ideas have been further developed for the purpose of improving the performance of the CPA method.In CHES 2006, Le et al. [26] proposed partitioning power analysis (PPA) that employs the Hamming distance as a measure of correlation.In addition, a method of improving the performance of PPA and CPA by restricting the normalization factor was also suggested in [26].The main benefit of this approach is a reduced number of power traces required for a successful attack (approximately by half compared to the classical CPA method).In COSADE 2010, Yongdae et al. [27] demonstrated that the distribution of power traces can be used in a more sophisticated manner, which resulted in an attack on AES (implemented in Xilinx FPGA) that uses (biased) power traces whose amount is eight times less compared to the classical CPA.In the same year, Komano et al. [28] proposed another version of CPA, the socalled built-in determined subkey CPA (BS-CPA), which uses the knowledge of previously retrieved m subkeys when recovering the ðm þ 1Þ-th subkey and thereby increasing the SNR.For the public databases of DPA Contest V1, the number of power traces required for BS-CPA to recover the secret key is 65, whereas this number equals 280 for a classical CPA.
In CARDIS 2012, Oswald and Paar [29] presented an algorithm for efficient computation of optimal filter coefficient values, which further improves the performance of CPA when preprocessing of power traces through a filter is performed.For the public databases of DPA Contest V2 (containing the power traces for different encryption algorithms and the involved secret keys), a template attack based on this method can recover the correct guessed key of the first S-box of AES with 3,000 power traces, whereas the classical CPA requires about 8,000 power traces.In ICISC 2013, Kim and Ko [30] used the so-called PCA to further improve the performance of CPA.The effect of increased correlation resulted in an even smaller number of power traces (requiring only 1,500) for recovering a portion of the secret key that affects the first S-box of AES.In SecureComm 2015, Zhang et al. [31] proposed a novel leakage model (simple-geneticalgorithm-based CPA) based on the power consumption of multiple S-boxes to turn the key-searching problem into a correlation coefficient optimization problem.The experiment shows that this method can reduce the power traces by 52% for DES and 32% for SM4 than the classical CPA.
To increase the efficiency of CPA, in CHES 2019, Timon [22] proposed a CPA variant called CNN-DDLA, which employs the deep learning and neural networks in a nonprofiling scenario.The experimental results showed that the CNN-DDLA method has the ability of recovering the secret key using 3,000 software desynchronized traces, whereas the classical CPA would fail in achieving this.Recently, in CHES 2019, Robyns et al. [32] improved the performance of CPA by encoding the leaked information for the purpose of maximizing the correlation coefficient for the EM traces.In 2020, Kwon et al. [33] introduced a novel approach that uses deep learning techniques in the nonprofiling SCA scenario.For a software AES implementation on the chip Whisperer-Lite platform, this approach improves the SNR from 5.18 to 20.49 when deep learning techniques are employed.In FGCS 2020, Ding et al. [34] pointed out that there are bitwise linear leakages in the implemenation of the block ciphers which involve keys with XOR operation and proposed a genetic-algorithm-based approach to conduct attacks on bitwise lineak leakages.

2
IET Information Security To be more specific, a serial implementation of the cryptographic algorithm's S-boxes makes its power consumption to be consistent with the operation of adding round subkeys, whereas in the case of a parallel implementation the power consumption of the device is rather consistent with the sum of the operations performed on each subkey.In the latter case, when the attacker uses a classical CPA to calculate the correlation between available power traces of a cryptographic device and the affected subkey, the power consumption of other subkeys is considered as noise which results in an increased number of power traces needed for a key recovery.It is worth noting that although there have been a plenty of works on improving the performance of nonprofiling SCA attacks, these approaches almost exclusively focus on the scenarios where S-boxes are implemented serially.Hence, the problem of reducing the noise and an optimal utilization of the leakage information (for the purpose of improving the performance of standard CPA methods) when S-box primitives are implemented in parallel appears to be a challenging task.
To address the above problem, we propose a subtle modification of the existing techniques toward their application to the case when S-box primitives are implemented in parallel.To achieve this, we give a detailed overview of the relevant methods and discuss the suggested modifications, which are summarized below.
(1) An Overview of Three Related Works.We review three works on improving the performance of CPA against block ciphers found in [26][27][28].In order to fairly compare the efficiency of these methods to our approach, under the assumption that S-boxes are implemented in parallel, we have reimplemented these methods and applied them to the public databases of DPA Contest V2.Based on these simulation results, we deduce the following conclusions: (i) The methods of Le et al. [26] and Komano et al. [28] have limited ability to improve the performance of CPA in this setting.(ii) The method of Yongdae et al. [27] requires a selection of certain specific power traces from a larger pool of power traces, which essentially does not reduce their number.(2) Our Modified CPA Method.We propose another strategy that improves the performance of CPA cryptanalysis in the above mentioned setting (having Sboxes implemented in parallel).The main refinement of our approach consists of suitable restrictions of the normalization factor and the use of previously recovered subkeys recursively when recovering the next subkey.When our method is applied to the public databases of DPA Contest V2, the simulation results indicate that it outperforms the other methods in terms of the required number of traces, see Table 3 and Figure 5.
1.3.Organization.The rest of this paper is organized as follows: Section 2 describes the common leakage model and the standard CPA method in more detail.In Section 3, we give an overview of the methods in [26][27][28] and describe our novel CPA approach.The performance of our improved CPA method is demonstrated experimentally through its application to the public databases of DPA Contest V2 in Section 4. Some concluding remarks are given in Section 5.

Preliminaries
We now provide a brief introduction to the relevant concepts such as the common leakage model and the correlation power analysis.

Common Leakage
The Hamming distance between v 0 and v 1 can be calculated as follows: The main idea behind Hamming distance model is to calculate a total number of 0-to-1 and 1-to-0 conversions of digital circuits in a certain period of time.The simulation of power consumption in the Hamming distance model uses the following assumptions: (1) All 0-to-1 conversions use the same amount of power as 1-to-0 conversions.(2) All 0-to-0 conversions and 1-to-1 conversions have the same effect on power consumption.(3) All omponents have the same effect on power consumption.(4) The static power consumption of the components is neglected.
In this paper, the leakage model W ¼ aHðD ⊕ RÞ þ b is used to describe the relationship between the power consumption of a cryptographic device (denoted by W) and the intermediate state of cryptographic algorithm D, where D IET Information Security depends both on the input data and the used key.Here, R represents a reference state (a constant machine code) and a is a constant that specifies the ratio between Hamming distance and power consumption W. Finally, b represents the remaining power consumption which is assumed to be independent of other variables.
2.2.Correlation Power Analysis.Correlation power analysis is the most widely used method when nonprofiling SCA attacks are considered.The specific steps of CPA are summarized below [4].
( Select an appropriate leakage model.Usually, the Hamming weight model performs better in software implementations, whereas the Hamming distance model is more suitable for hardware implementations.For each possible value k j; l of the subkey k * j , the power consumption corresponding to all intermediate states in dj; l ¼ fd j; l; 1 ; …; d j; l; N g is calculated and these values form the vector H j; l ¼ fh j; l; 1 ; …; h j; l; N g.For each possible value k j; l of the subkey k * j , the correlation coefficient r j; l is calculated using the power traces WðP i Þ/WðC i Þ and the intermediate state vector Hj; l ¼ fh j; l; 1 ; …; h j; l; N g.The value of k j; l that corresponds to the maximum correlation coefficient is regarded as a correct guess for the subkey k * j .The correlation coefficient r j; l in the encryption direction is given by Equation (3).

Related Works and the Description of Our Attack
In this section, we first review the three relevant works on improving the performance of CPA, considered by Le et al. [26] at CHES 2006, the method of Komano et al. [28], and the Yongdae et al. [27] approach.Then, we present the detailed steps of our improved CPA method.Later, in Section 4, we will estimate the performance of these three methods when applied to certain block ciphers whose S-boxes are implemented in parallel and give a comparison to our method.

An Overview of the Relevant Works
(1) Le et al. [26] Method.
The so-called PPA was originally proposed in [26].PPA is a general concept that includes the mono-bit DPA (differential power analysis), the multibit DPA and CPA based on the Hamming distance.We now describe the PPA attack in more detail when it is applied during the encryption process of the considered block cipher.During the attack, we first need to determine the targeted d-bit set β ¼ b 1 …b d and the number of guessed keys N k ¼ 2 d .Then, for each guessed subkey k j; l (as a d-bit portion of the secret key), one computes the Hamming distance m ¼ HDðWðP i Þ; R; D; k j; l Þ with each trace WðP i Þ and subsequently divides N power traces Þg. Finally, one can calculate the correlation coefficient r j; l of the guessed key k j; l with respect to WðP i Þ using the following equation: where N j; l; m ¼ #ðG j; l; m Þ refers to the number of power traces in G j; l; m , and σ W represents the standard deviation of N power traces and can be computed as . On the other hand, σ H stands for the standard deviation of the Hamming distance of a guessed key k j; l when using N power traces.It can be , where 4 IET Information Security α j; l; m denotes a specific weight of each class G j; l; m , and it can be determined using α j; l; m ¼ However, if the targeted d-bit set β ¼ b 1 …b d follows a uniform distribution, then α j; l; m is given by: where The main difference between DPA and CPA is essentially the value σ W σ H which equals to 1 for the DPA, whereas for CPA, σ W σ H is a product of the standard deviation of N power traces σ W and the standard deviation σ H of the Hamming distance of a guessed key k j; l with N power traces.
In [26], it was shown that the normalization of σ W σ H negatively affects the SNR, which becomes low and can render the PPA attack impossible.In order to improve the performance of PPA, the normalization effect of σ W σ H was reduced by adding to σ W a positive constant ε [35].The correlation coefficient r j; l is then computed as follows: Note that there are two aspects that need to be taken into account when applying PPA-like cryptanalytic techniques.The first one is that usually the plaintext blocks corresponding to power traces are random.In a chosen plaintext scenario, there is a possibility of reducing the noise of power traces which leads to a better SNR.The second aspect is that the value of a positive constant ε should be chosen carefully in which case the SNR can be improved without modifying any parameter of significance.
The classical CPA method recovers the targeted d-bit of the secret key of the cryptographic algorithm individually.Such an approach is appropriate when cryptographic Sboxes are implemented serially since in this case the power consumption refers to a single targeted S-box.However, when the encryption algorithms use FPGA or ASIC as its implementation platform, then S-boxes are commonly implemented in parallel for the sake of performance.For this kind of cryptographic devices, power traces collected by the attacker correspond to all S-boxes used in a single encryption round (these S-boxes process the secret key k * as a part of its input data).In this case, the attacker recovers a d-bit subkey k * j of the secret key k * by calculating the correlation between a guessed subkey k j; l and the power consumption which refers to all S-boxes implemented in parallel.This means that the power consumption of the targeted S-box is only a suitable portion of the measured power trace, which is not consistent with the main ideas of classical CPA.
To remedy this issue, Komano et al. [28] proposed a more general power analysis, known as built-in determined subkey CPA (BS-CPA).Its main strategy is to use previously recovered subkeys recursively when computing the correlation between a guessed key k j; l and measured power traces, in the process of recovering a new subkey.The main advantage of BS-CPA over classical CPA is that a larger number of successfully recovered subkeys implies a greater correlation for the remaining subkeys and in general the number of power traces required is smaller compared to the CPA.A pseudocode below, describes the main steps of BS-CPA.
In the above algorithm, n k * denotes the number of subkeys of the secret key k * , whereas I 0 ¼ fjg specifies indices of nonrecovered subkeys (initialized by I 0 ¼ f1; …; n k * g).The set of ordered pairs I 1 ¼ fðj; k * j Þg collects the information about recovered subkeys, where j refers to the indices of recovered subkeys and k * j refers to the value of j-th recovered subkey.Max j represents the threshold value of the j-th subkey.The correlation coefficients r 0 j; l are computed using Equation ( 7), where p represents the number of recovered subkeys, is the Hamming distance of the k-th recovered subkey k * k from the power trace WðP i Þ, with k 2 ½1; p.The remaining symbols have the same meaning as in Equation (3).
It is important to notice the following when applying the BS-CPA.First, for the calculation of correlation coefficients of a guessed key k 1; l (thus retrieving the first subkey), in Equation (7), ∑ p k¼1 h * k; i ¼ 0. Also, the BS-CPA method is suitable for the scenarios where the cryptographic algorithm uses a parallel implementation of S-boxes, and it performs better than the higher level of parallelization is.Finally, the correctness of already recovered subkeys greatly affects the number of power traces required for a successful recovery of the subsequent subkeys.
The power traces collected by the attacker always contain a noise component, which can increase a required number of power traces.Apparently, the lower the noise level the smaller is the number of power traces.The noise in a cryptographic device can be divided into two classes, namely IET Information Security electronic and switching noise.Electronic noise is the power consumption generated by electronic components of the cryptographic device such as, e.g., transistors, and this noise is affected by the measurement setup.Switching noise is the power consumption generated by frequent switches of the logic gets in the cryptographic device, and its level depends on the level of parallelization.In order to reduce the impact of the electronic noise on the side-channel attack, cryptanalysts usually employ some preprocessing techniques in order to increase the SNR such as PCA [11], SSA [12], and so on.
Yongdae et al. [27] proposed an efficient preprocessing technique of selecting (biased) power traces for the purpose of increasing the SNR and therefore achieving a better performance of CPA attacks.For convenience, this method is referred to as BIAS-CPA in this paper.The BIAS-CPA method can be efficiently applied when power traces follow the normal distribution, in which case the power traces with large variance or standard deviation are selected in an SCA attack.The detailed steps of the attack process are given as follows: (i) N 1 power traces are used to determine which sampling instance has the strongest correlation with the selected intermediate state, and this instance is denoted by t ct .(ii) These N 1 power traces are then used to calculate the mean μ t ct and the variance σ 2 t ct corresponding to t ct .(iii) Then, the probability density value of the sampling instance t ct is calculated for each power trace, using the probability density function which is determined from μ t ct and σ 2 t ct .(iv) According to the probability density value in (3) above, the N 1 power traces are grouped in descending order.(v) Select the first N 2 power traces out of N 1 , to conduct an SCA attack on the targeted cipher.
Notice that the BIAS-CPA method is similar to a chosenplaintext side-channel attack.The difference between them is that the attacker selects specific power traces when mounting a BIAS-CPA attack, whereas in the latter case specific plaintext blocks are chosen for which power traces are measured.
3.2.Our New Improved CPA Method.Before we describe our method, we briefly discuss the main properties of the above mentioned methods and the reasons behind they do not behave optimally when parallel implementation of S-boxes is the underlying setting.In Section 4, we will provide experimental results that indicate a significant reduction in the required number of power traces for our improved strategy, compared to the three methods described in the previous section.A performance comparison given in Table 3 in Section 4, leads to the following main conclusions: (1) For classical CPA, 12, 688 power traces are sufficient to recover the secret key k * of the public databases of the DPA Contest V2, while 500 power traces can recover the key of the single chip microcomputer [36].That is to say, the number of power traces required by classical CPA for recovering the secret key of the cryptographic devices is substantially larger when S-boxes are implemented in parallel.(2) PPA can significantly reduce the number of power traces needed for each subkey recovery but its number, being typically around 3, 650 (as shown in Table 3), is still quite large.(3) For many subkeys (but not all), BS-CPA can reduce the number of power traces required for their recovery by more than 2,000, see also Table 3.However, BS-CPA cannot reduce the number of power traces required for a recovery of the first subkey.(4) BIAS-CPA is a quite powerful approach, but it requires a large pool of power traces and the selection of those that possess certain special properties, which cannot always be efficiently applied (the required amount of power traces being not available).Therefore, a more effective attack method is needed for handling the case of parallel implementation of Sboxes in cryptographic algorithms.
It is worth noting that both PPA and BS-CPA aim at improving the SNR (compared to the classical methods), but the nature of these improvements differs quite substantially.Hence, there is a possibility of combining these two methods for the purpose of maximizing the SNR, especially when parallel implementation of S-boxes is considered.This leads to a significant reduction of the number of power traces needed for a secret key recovery.We notice that both PPA

6
IET Information Security and BIAS-CPA improve the performance of CPA by increasing the variance of power traces.However, this is done in a different way and whereas PPA adds a suitable constant ε to the variance of the power traces, BIAS-CPA simply selects the power traces with larger variance.The simulation results in Section 4.2, performed assuming that power traces follow a normal distribution, justify the addition of an appropriate constant ε that essentially also reduces the number of power traces used by PPA.k; i þ h j; l; i ) corresponding to p recovered correct subkeys and the guessed subkey k j; l (corresponding to the ðp þ 1Þ-th subkey), rather than the one h j; l; i corresponding to the guessed key k j; l .Thus, the power consumption generated by the p recovered correct subkeys cannot be treated as noise, improving the SNR during the attack process.
To compute the correlation between each possible guessed subkey k j; l with the p recovered correct subkey k * k (k 2 ½1; p) and N power traces, we adopt the PPA method based on the Hamming distance.Specifically, we divide N power traces WðP i Þ into ðd þ 1Þ partitions classes G j; l; 0 , …, G j; l; d (where G j; l; m ¼ fWðP i Þ; i 2 ½1; Njm ¼ HDðWðP i Þ; R; D; k j; l Þg) with the guessed subkey k j; l (corresponding to the ðp þ 1Þ-th subkey), and G * k; 0 , …, G * k; d (there are a total of p similar power trace sets) (where k Þg and k 2 ½1; p) with each correct subkey k * k .Then, we use the decision signal Equation ( 8) to calculate the correlation coefficient rj; new .The detailed derivation of Equation ( 8) is shown in Appendix.Note that Equation ( 8) is a combiantion of the Equations ( 6) and (7).Here, we combine PPA and BS-CPA.On the one hand, we use BS-CPA to improve the SNR of the correlation coefficient rj; new .On the other hand, we use PPA to enhance each subbyte's attack capability, which can compensate for the shortcomings of BS-CPA.
During calculating correlation rj; new using Equation (8), we need to determine the specific value of the positive constant ε.The specific value of the positive constant ε would affect the improved attack ability.According to the reimplementation of BIAS-CPA, we found that when the power traces belongs to the union of ð − 1; μ t ct − 2σ t ct Þ and ðμ t ct þ 2σ t ct ; þ1Þ, the improved attack ability is better.Thus, we set the positive constant ε to an appropriate constant, so that σ W þ ε in the correlation coefficient rj; new (i.e., Equation ( 8)) immediacy approaches the union of ð − 1; μ t ct − 2σ t ct Þ and ðμ t ct þ 2σ t ct ; þ1Þ.Specially, we use SPA to determine the sampling point t ct , which has the strongest correlation with the selected intermediate state, with the available power traces.Then, we calculate the mean value μ t ct and standard deviation σ t ct of the sampling point t ct with these available power traces.Finally, we set the positive constant ε to ensure σ W þ ε is in the union of ð − 1; μ t ct − 2σ t ct Þ and ðμ t ct þ 2σ t ct ; þ1Þ.Note that the method of determining the specific value of the positive constant ε in this article is beneficial for improving the attack capability of the improved CPA, but it may not necessarily be the most effective.This forms a CPA improvement method suitable for S-boxes parallel computing scenarios.
In short, we use BS-CPA to improve SNR of the attack process, and use PPA to calculate the correlation more accurately, especially the correlation for the first subkey.In addition, we set the specific value of the positive constant ε based on the core idea of BIAS-CPA, in order to further enhance the attack capability of the improved CPA.The detailed steps of CT-CPA are given as follows: (1) Select the sampling point which gives the largest correlation for the available power traces and denote it as t ct : (2) Calculate the mean value μ t ct and standard deviation σ t ct at the sampling point t ct of the power traces using a relatively small number of traces (according to simulations in Figure 3 it is sufficient to use 150 power traces to estimate the mean value and standard deviation).(3) Initialize I 0 ¼ f1; …; n k * g, I 1 ¼ fð0; 0; 0Þg and set ε to an appropriate constant value according to the mean μ t ct and standard deviation σ t ct for the given power traces.(4) Set N ¼ 0. For j 2 I 0 , the following attack is performed as the number of power traces N increases: (i) For each guessed subkey k j; l , the Hamming distance m ¼ HDðWðP i Þ; R; D; k j; l Þ is calculated using N power traces.(ii) Divide N power traces WðP i Þ into ðd þ 1Þ classes G j; l; 0 ; …; G j; l; d , where G j; l; m ¼ fWðP i Þ; i 2 f1; …; Ngjm ¼ HDðWðP i Þ; R; D; k j; l Þg. (iii) Calculate the correlation coefficient r j; l; new for the guessed key k j; l according to the Equation ( 8), for j ¼ 1; …; 2 d , thus forming a correlation vector rj; new ¼ fr j; 1; new ; …; r j; 2 d ; new g.(iv) Sort the correlation vector rj; new in descending order and record the guessed key k j; l corresponding to the first element in rj; new .(v) Delete j from the set I 0 and add ðj; k j; l ; N j Þ to the set I 1 whenever the correlation coefficient of the guessed key k j; l appears at the first position in the correlation vector rj; new at least T j times, where T j is a threshold value indicating that the guessed subkey value always appears at the first position in the correlation vector with further increase of power traces N.

IET Information Security
(5) Check whether I 0 is empty.If I 0 is empty, terminate the attack.Otherwise, return to Step 4.
Remark 1.The purpose of Steps 1, 2, and 3 is to determine an appropriate value of the positive constant ε, which is derived from BIAS-CPA.The Step 4 is a combination of PPA and BS-CPA.In Equation ( 8), used to compute the correlation coefficients r j; l; new , the set I 1 ¼ fðj; k * j ; N j Þg specifies the information about recovered subkeys.More precisely, j is the index of the recovered subkey k * j and N j refers to the number of power traces required to recover the j-th subkey.Moreover, m ¼ HDðWðP i Þ; R; D; k j; l Þ represents the Hamming distance between the guessed key k j; l and the power trace WðP i Þ, where j 2 ½1; n k * and l 2 ½1; represents the specific weight of the class G * k; m .The specific formula for its calculation is: H represents the standard deviation of the Hamming distance between the guessed key k j; l and N power traces taking into account p recovered subkeys.The parameter σ 0 H can be calculated as follows: represents the Hamming distance between the k-th recovered subkey k * k and the power trace WðP i Þ.Here, h j; l; i ¼ HDðWðP i Þ; R; D; k j; l Þ denotes the Hamming distance of the guessed subkey k j; l and the power trace WðP i Þ.The remaining symbols have the same meaning as in the Equations ( 3) and ( 4).Note that we have determined the constant ε using the mean and standard deviation of those power traces used to recover the secret key in BIAS-CPA.However, we believe that there are other methods for specifying the constant ε, which deserve further analysis.In addition, if the available power traces are insufficient to ensure the sampling point t ct which has the strongest correlation, we can perform the CT-CPA on each samping point to recover the key of the cryptographic algorithm.This would result in a longer time required for CT-CPA.

Experiments and Performance Evaluation
In this section, we first discuss DPA Contest V2.Then, we give the detailed parameters of the reimplementation of PPA, BS-CPA, BIAS-CPA, and CT-CPA.Finally, we give a performance comparison that clearly illustrates the benefits of using CT-CPA.
4.1.DPA Contest V2.The DPA contests specifies a series of challenges aiming at comparing the different attack methods in an objective manner [37].Among them, the target of DPA Contest V2 is the unprotected parallel implementation of AES on FPGA.The power traces of DPA Contest V2 are collected on a SASEBO-GII board, which is a standard evaluation board for side-channel attacks.The SASEBO-GII board runs at 24 Mhz, whereas for the measurement of power traces the oscilloscope's sampling rate is 5 Gsample/s.
The DPA Contest V2 contains two public databases.The first database has 1,000,000 power traces, where each power trace is the power consumption of AES with a random plaintext and a random secret key.Hence, one can obtain the plaintext, secret key and ciphertext for each power trace with the name of the power trace file.This database is used for profiling SCA attacks.The second database has 640,000 power traces and uses only 32 random secret keys to generate 20,000 power traces for each key.Similarly, one can also obtain the plaintext, secret key and ciphertext for each power trace in this database.
The target of our attack is the second database.Figure 1 shows the power traces of of DPA Contest V2 for AES on the SASEBO-GII board.One round of AES encryption is completed within one clock cycle on the SASEBO-GII board.From Figure 1, we can see that the number of sample points for a single power trace is 3,253.Therefore, there are approximately 208 samples per clock and each power trace has a duration of 15.6 clocks, which is more than 10 rounds of AES.

Simulation Results.
During the implementation of PPA, BS-CPA, BIAS-CPA, and CT-CPA, we have set sixteen 8-bit registers corresponding to the 10th encryption round of AES as the attack target.We choose the Hamming distance model as the leakage model, which is the power consumption that reflects transitions of the registers from 9th round to 10th round.In [27], three methods for determining the sample point with the greatest correlation were proposed.In this article, we use SPA to determine the 2,394-th sample point as a 8 IET Information Security sample point with the greatest correlation, hence t ct ¼ 2; 394.The attack range is between 2,350 and 2,450.The parameters for each improved CPA method are given below.
In this case, one needs to determine the specific weights α j; l; m and the constant ε.As the number of power traces is large and the bits of β are uniformly distributed, the coefficients α j; l; m are calculated using Equation (5).The specific values of the coefficients α j; l; m are shown in Table 1.The value of positive constant ε is chosen to be 33, which is 10% of the standard deviation of the power traces σ W (following the same reasoning as in [26]).
Komano et al. [28] introduced the parameter Max i to denote the maximum correlation coefficient of all guessed subkeys k j; l , when a recovery of the j-th subkey is considered.Instead of specifying Max i , in order to estimate the performance of BS-CPA, we have reimplemented this method assuming the knowledge of a certain portion of the subkeys.To this end, suppose we have correctly recovered p subkeys.When recovering the ðp þ 1Þ subkey, we use Equation ( 7) to calculate the correlation r 0 j; l for each guessed subkey k j; l using the values of p correct subkeys.The decision strategy for determining the number of power traces required to recover each subkey is that the correctly guessed subkey preserves the maximum correlation value as the number of power traces increases.
The main assumption of BIAS-CPA is that the power traces follow a normal distribution.Therefore, we first verify whether the power traces of DPA Contest V2 satisfy this assumption.With N ¼ 20; 000 power traces, generated using the same random secret key, the mean and standard deviation of the 2,394-th sample point of the power traces are 3314.73and 329.36, respectively.The black curve in Figure 2 represents the probability density value of the normal distribution N(3314.73,329.36 2 ), whereas the blue points in Figure 2 approximate the frequency of the 2,394-th sample point for N ¼ 20; 000 power traces.Notice that the frequency of each value at the 2,394-th sample point is obtained by dividing the number of its occurrence by the number of total power traces N. From Figure 2, we draw the conclusion that the 2,394-th sample point of N ¼ 20; 000 power traces of DPA Contest V2 indeed follows the normal distribution N(3314.73,329.36 2 ), thus applying the BIAS-CPA method to the public databases of DPA Contest V2 is therefore justified.
First, CT-CPA also requires that the power traces follow a normal distribution, which has already been verified.Second, we need to calculate the mean μ t ct and standard deviation σ t ct of the 2,394-th sampling point of DPA Contest V2 when using only a relatively small number of power traces.The variation of the mean μ t ct and standard deviation σ t ct of the 2,394-th sampling point, as the number of power traces increases, is shown in Figure 3(a).Figure 3(b) is a close-up look at Figure 3(a), focusing on the first 250 power traces.According to the diagrams in Figure 3, after the number of power traces has increased to 150 the mean μ t ct and standard deviation σ t ct will remain constant.This means that we can assume that the power traces follow a normal distribution even though their number is relatively small.The next step is to set ε to an appropriate constant value.In accordance to the reimplementation results of BIAS-CPA, the power traces used to recover the secret key of the public databases of DPA Contest V2 belong to the union of ð − 1; μ t ct − 2σ t ct Þ and ðμ t ct þ 2σ t ct ; þ1Þ.With respect to the fact that the standard deviation μ t ct of the 2,394-th sample point is 329.36 (for the power traces of DPA Contest V2), we have set the constant ε to be 500.In this way, when the number of power traces used to attack DPA Contest V2 exceeds 150 then ðσ W þ εÞ approximately approaches the standard deviation 818.05 of the union of ð − 1; μ t ct − 2σ t ct Þ and ðμ t ct þ 2σ t ct ; þ1Þ.The number of power traces for PPA with the   The specific value of the coefficients α j; l; m .α j; l; 0 α j; l; 1 α j; l; 2 α j; l; 3 α j; l; 4 α j; l; 5 α j; l; 6 α j; l; 7 IET Information Security constant ε ¼ 500 is shown in Table 2.According to Table 2, setting the constant ε to be 500 implies that the number of power traces required for a recovery of AES' subkeys is reduced compared to PPA with the constant ε ¼ 33.In particular, the number of power traces required to recover the 0th, 3th, 4th, 5th, 7th, 8th, 9th, 11th, 12th, and 13th is less than 610.To summarize, the constant ε is chosen so that ðσ W þ εÞ approximates the standard deviation of the union of ð − 1; μ t ct − 2σ t ct Þ and ðμ t ct þ 2σ t ct ; þ1Þ of the normal distribution Nðμ t ct ; σ 2 t ct Þ, which can improve the performance of both PPA and CT-CPA.
Finally, we empirically set the threshold value T j , introduced in Section 3.2.1, to be 100 due to the following.For N ¼ 20; 000 power traces generated using a random secret key in DPA Contest V2, the correct value of the 15-th subkey is 0xE1.However, by increasing the number of power traces from 246 to 339, the correlation of incorrect subkey value 0x2F remains maximum, which implies that this key byte is incorrectly recovered.
In the worst case, the number of power traces for which the correlation of a wrongly guessed subkey may be maximal equals 94, which justifies the use of T j ¼ 100 for identifying the correct subkey values.Figure 4 shows the worst case scenario, when the correlation of an incorrectly guessed subkey calculated by CT-CPA achieves the maximum as a function of the number power traces.

IET Information Security
Note that the value of α * k; m and α j; l; m in Equation ( 8) are shown in Table 1.

Performance Evaluation.
In this section, we analyze the performance of the mentioned attack methods in terms of the required number of power traces and discrimination which refers to the ratio between the correlation coefficient of the correct key candidate and of the wrongly guessed key whose correlation coefficient is maximal among the wrong guessed keys (see below for more details).
(1) Required Number of Power Traces.This parameter refers to the minimum number of power traces required to successfully recover a subkey.Figure 5 shows the correlation variations for all guessed values for the 15-th subkey as the number of power traces increases, for different attack methods.According to Figure 5, to recover the 15-th subkey, CPA, PPA (with ε ¼ 33), BS-CPA, and BIAS-CPA require 8,353, 3,771, 5,573, and 1,228 power traces, respectively.On the other hand, our CT-CPA method only needs 428 power traces for the same purpose, which is a substantial improvement over the other methods.
In addition, Table 3 shows a required number of power traces for recovering all the subkeys for these attack methods.According to Table 3, classical CPA, PPA (with ε ¼ 33), BS-CPA, and BIAS-CPA require 12,688, 3,771, 10,808, and 3,472 power traces, respectively.There is a huge improvement of CT-CPA over the other methods since it only requires 495 power traces for recovering all the subkeys.Table 4 shows the results of attacking the public databases of DPA Contest V2 submitted by participants, which can be obtained from [38].As evident from Table 4, the number of power traces needed for the best nonprofiling SCA attack is 7,061.However, our CT-CPA only requires 495 power traces to recover AES's secret key for the public database of DPA Contest V2, which is a huge improvement compared to 7061.In addition, for the profiling SCA attacks, Benoit Gerard's method (mentioned in [38]) performs best and needs only 439 power traces.Since in general profiling SCA attacks perform better than that nonprofiling SCA variants, Tables 3 and 4 show that CT-CPA can significantly reduce the gap in the performance of these two classes of side-channel attacks.
For the purpose of evaluating the performance of different CPA methods, we introduce a new concept called discrimination.In brief, the discrimination describes the ratio between the correlation coefficient of the correct subkey candidate and of the maximum correlation value among incorrect subkey values.The following easy conclusions can be immediately deduced given this ratio: (i) If discrimination is greater than 1, then the correct subkey value attains the maximum correlation, which implies that the considered subkey is successfully recovered.(ii) If discrimination is less than 1, there exists an incorrectly guessed subkey having the maximum correlation, thus implying a failure in the process of this subkey recovery.(iii) In general, a larger discrimination value implies a larger distance between the correlation of a correctly guessed subkey and of a wrong one, which gives a better resolution when distinguishing correct and wrong subkey values.
One can also consider the dependency of discrimination on the increased number of power traces.Especially, with respect to the recovery of the 15-th subkey we provide (in this context) a performance comparison of different attack methods in Figure 6 with the following conclusions: (i) Discrimination of CT-CPA and PPA are greater than that of BS-CPA, CPA, and BIAS-CPA.It shows that CT-CPA and PPA are more successful in distinguishing the correct subkey than the other methods.(ii) Discrimination of CT-CPA and PPA remains roughly invariant when the number of power traces increases, assuming that the previous subkey has been successfully recovered.Therefore, CT-CPA and PPA perform similarly in terms of discrimination (resolution) but our approach requires much less power traces compared to PPA. (iii) Discrimination for BS-CPA is always approximately 1 as the number of power traces increases, indicating that there always exist certain subkey values whose correlation remain the same as that of the correct one.This means that when the SNR of the power traces is relatively low then the BS-CPA method can perform quite badly, which then indicates that CT-CPA has a better discrimination ability than BS-CPA.
During attacking each subkey of AES in DPA Contest V2, we sort all guessed keys in descending order based on   IET Information Security their correlation with N power traces.The guessing entropy, refers to as GE, indicates that the ranking of the correct guessed key among all guessed keys.The range of GE is from 0 to 255, where 0 represents this subkey is successfully recovered.We used CT-CPA to attack 6.4-million power traces of 32 keys (20,000 power traces for a key), and obtain the guessing entropy for each subkey byte, as shown in Figure 7. From Figure 7, we can see that when the number of power traces increases to about 495, the guessing entropy of the correct guessed key for all subkey bytes is less than 5.That is to say, when the number of power traces increases to 495, we can reduce the space complexity of guessed keys from 2 128 to 2 37 .In this case, we can obtain the correct key of AES through exhaustive search.
(4) Success Rate.This parameter indicates the success probability of recovering a subkey or all keys of a cryptographic algorithm.Figure 8 shows that the success rate of recovering each subkey using our CT-CPA as the power traces increase from 0 to 2,000.From Figure 8, we can find that when the power traces increases to 495, the success rate of subkey bytes 0; …; 9, 11, and 15 can reach 80% (as shown in Figure 8(a)), and the success rate of the ramaining subkey bytes can reach 60%.In addition, we can discover that when the power traces increase to 800, the success rate of all subkeys ranges from 60% to 80% (as shown in Figure 8(b)).

Conclusions
In this paper, we have given an overview of the improved CPA methods and simulated their performance using the public databases of DPA Contest V2.To further improve the performance of CPA attacks, we have introduced a new CPA method (abbreviated as CT-CPA) which is suitable in the scenarios when cryptographic algorithm use parallel implementation of S-boxes.With respect to the public databases of DPA Contest V2, the experimental results show that CT-CPA only requires 495 power traces for recovering the secret key of AES which is a huge improvement over the other CPA techniques.

Appendix
In this section, we give a detailed derivation of Equation (8).It is worth noting that the symbols appearing below have the same meaning in the appendix as in the text.Equation (3) shows that the equation for calculating the correlation coefficient of the guessed subkey k j; l by CPA is, ðA:1Þ Suppose that CT-CPA is used to estimate the subkey value k j; l , assuming that p correct subkey values have been recovered.Considering the impact of the p correct subkey values on calculating the correlation coefficient of the candidate subkey k j; l (for the j-th subkey), the correlation coefficient r 0 j; l is expressed as follows:  IET Information Security  The correlation coefficient r 0 j; l becomes: The correlation coefficient r 0 j; l can be written as follows: Since CT-CPA improves the performance of CPA by adding a constant ε to the standard deviation of the power traces σ W , the correlation coefficient of CT-CPA r j; l; new is expressed as follows: ðA:7Þ

( 5 )
Calculate and Compare the Correlation Coefficients.

3. 2 . 1 .
Description of CT-CPA.Our new improved CPA method, named CT-CPA in this paper (where CT stands for combined techniques), assumes that the power traces of the cryptographic device follow a normal distribution.It combines the ideas of BS-CPA and PPA with the appropriate constant ε which is selected in accordance to the standard deviation of the power traces of the cryptographic device.When mounting CT-CPA attack on a block cipher, we consider d-bit set β ¼ b 1 ; …; b d as the attack target (any subkey being of length d bits), and divide the secret key k * into n k * subkeys.Then, we use the BS-CPA method to recover each subkey one by one.When attacking the ðp þ 1Þ-th subkey, the calculated hypothetical intermeidate value is the sum of the hypothetical intermediate values (∑ p k¼1 h *

FIGURE 1 :
FIGURE 1: Power traces of AES on the SASEBO-GII board for DPA Contests V2.The horizontal axis represents the time samples proportional to clock cycles.The vertical axis represents the relative voltage values.
value of 2,394-th sample point of the power traces 4,000 4,500

FIGURE 2 :
FIGURE 2: Probability density value/frequency of the 2,394-th sample point of the power traces.The horizontal axis represents the value of the 2,394-th sample point for each power trace.The vertical axis represents the probability density value/frequency of each value of the 2,394-th sample point of the power traces.

FIGURE 3 :
FIGURE 3: The variation of mean and standard deviation of 2,394-th sample point are represented by the red line and the black line, respectively.(a) Variation of mean/standard deviation with 20,000 power traces and (b) a close-up look at Figure 1(a).The horizontal axis represents the number of power traces.The vertical axis represents the value of mean and standard variation.

FIGURE 4 :
FIGURE 4: The maximum correlation for the 15-th subkey of CT-CPA for the public databases of DPA Contest V2.The horizontal axis represents the number of the power traces.The vertical axis represents the guessed key with maximum correlation.

FIGURE 5 :
FIGURE 5: Variations of the correlations between guessed subkeys and the correct 15-th subkey for different methods.(a) CPA, (b) PPA (ε = 33), (c) BS-CPA, (d) BIAS-CPA, and (e) CT-CPA.The red and black lines represent the correlation to a correctly and wrongly guessed subkey, respectively.

FIGURE 6 :
FIGURE 6: Variations in the discrimination of the 15-th subkey as a function of power traces for different methods.

N
⋅ nÞ and α j; l; m ¼ N j; l; m N ðm − ∑ d n¼0 N j; l; n N ⋅ nÞ.The term A becomes: 1.2.Motivation and Contributions.When classical CPA methods are used in cryptanalysis of cryptographic devices such as single chip microcomputer, RFID tags, wireless sensors, and only dozens (possibly hundreds) of power traces are needed for recovering the secret key of the device.However, applying a classical CPA to devices implemented on FPGA and ASIC platforms requires thousands of power traces for the key recovery.The reason for this phenomenon is that different implementations of the cryptographic algorithm can make the classical leakage model of CPA, based on a single S-box, more or less accurate.
Model.A majority of power analysis attacks uses Hamming distance or Hamming weight model as the preferred leakage models.Define a data word β ¼ b 1 b 2 …b d , where b i 2 f0; 1g.The Hamming weight of D is the number of bits whose logical value is 1, that is,

TABLE 2 :
The number of power traces for PPA with the constant ε ¼ 500.

TABLE 4 :
[38]number of power traces to recover the public databases of DPA Contest V2 submitted by the participants[38].