RFA : R-Squared Fitting Analysis Model for Power Attack

Correlation Power Analysis (CPA) introduced by Brier et al. in 2004 is an important method in the side-channel attack and it enables the attacker to use less cost to derive secret or private keys with efficiency over the last decade. In this paper, we propose R-squared fittingmodel analysis (RFA) which is more appropriate for nonlinear correlation analysis.Thismodel can also be applied to other side-channel methods such as second-order CPA and collision-correlation power attack. Our experiments show that the RFA-based attacks bring significant advantages in both time complexity and success rate.


Introduction
With the development of information technology, information security plays an important role in medical system [1,2], communications [3], finance [4], and other fields.Sidechannel analysis [5,6] which focuses on exploiting the implementation or some measurable nonmathematical property of a cryptographic system, was introduced by Kocher et al. in 1996.It marks the outbreak of this new research field in the applied cryptography area, so it has advanced quickly such as power analysis [7,8] and electromagnetic analysis [9][10][11] in recent two decades.At the same time, many relational techniques have been published which can easily get the secret key by the information leakage.
When using statistical methods to analyze encryption devices, there are several common methods which can be observed.The first one is differential power analysis [5] which was introduced by Kocher.Another is Correlation Power Analysis (CPA) which is introduced by Brier et al. in 2004 [12].CPA is more efficient than others as it significantly reduces the quantities of the power traces needed for recovering the secret key.Therefore, there are lots of researches in this field.
CPA uses two main models for relating the instantaneous power consumption and the data being manipulated.One is Hamming weight model and the other is Hamming distance model [12].Then the correct key will be got by calculating the relationship between the changes of the specific register and the power consumption with Pearson's Correlation Coefficient (PCC).Because of the efficiency and operability of CPA, it has been widely studied and applied on various cryptographic algorithms, such as DES and AES.
In 2008, Gierlichs et al. proposed mutual information analysis which used information theory to develop a powerful attack without any device characterization [13].With the development of artificial intelligence technology, differential cluster analysis was introduced in 2009 [14].This technique could use cluster analysis to detect internal collisions and it combines features from previously known collision attacks and differential power analysis.In 2013, a new second-order side-channel attack based on linear regression was proposed by Dabosville et al. [15].The authors introduced a linear regression model and analyzed the second-order attacks by this technique.In 2016, Bos et al. presented differential computation analysis to assess the security of white-box implementations which required neither knowledge about the look-up tables used nor any reverse engineering effort [16].
At the same time, several countermeasures have been proposed to secure those algorithms from first-and highorder attacks.The first practical evaluation was performed on one additive and one multiplicative masking scheme of AES [4].An enhancement of this method was proposed [17] which improved the CPA by restricting normalization factor.In 2011, Clavier et al. proposed collision-correlation power analysis on first-order protected AES.
However, to implement the power attack, each collection contains random noise in the process of power consumption.Overall, the noise is normally distributed for the whole traces.But it is also discrete distribution for each power trace.And the PCC cannot describe the correlation better because it is a statistical measure of the strength of a linear relationship between two variables.Therefore, the efficiency and accuracy may be affected.
In this paper, we propose a new method to operate the side-channel attack.The main contributions are as follows: (1) A concept of -squared fitting analysis (RFA) model for power analysis is proposed.This method can describe the correlation of the data better than CPA.In suitable experimental environment, the success rate of RFA is the same as CPA.But, in the poor environment, the result of RFA is better than CPA.
(2) RFA can improve the efficiency compared with the classic CPA which is used PCC.A model of the power traces with different Hamming weight from 0 to 8 is set up.In the case of more key points, it can effectively remove the interference of extra random noise so as to improve the success rate and shorten the operation time.
(3) RFA method has wide applicability, which is verified by simulation experiments in the different test scenarios.Its efficiency is similar to or better than that of the CPA.
In this paper, the organization is as follows.The Hamming weight model, -squared model, and the classic CPA are given in Section 2. In Section 3, we introduce the basic idea and the -squared fitting model analysis.Then, in Section 4, we introduce the application on the traditional CPA and comparison between the RFA and CPA on AES.In Section 5, we apply the RFA to the other attack methods.Finally, we conclude the paper in Section 6.

The Preliminaries
In this section, we first discuss the Hamming weight model.Second, the CPA steps are shown.And, finally, the basic principle of -squared is introduced.

The Hamming Weight
Model.In many ways, Hamming weight model is the simplest method which is proposed in [5,18] to analyze the correlation between power consumption and the register switching from one state to the other.In CPA, it is generally assumed that the leakage from the power side-channel depends on the number of bits switching 0 to 1 or 1 to 0 at a given time.And the register is modeled as a state transition which is triggered by some events such as the edge of a clock signal.In an -bit register, binary data  = [ 0 ,  1 , . . .,  −1 ] 2 is coded as  = ∑ −1 =0   2  , with the bit values   = 0 or 1.And its Hamming weight is the number of 1, () = ∑ −1 =0   .The Hamming weight model neglects some factors which have an influence on the power consumption, for example, parasitic capacities, glitches, and transition events.When using the Hamming weight model for analysis, we assume that the power consumption is proportional to the number of bits set to logic 1 of the processed sensitive variable.In reality, we need to use the Hamming weight model only if the previous state is all 0.
The linear relationship between the power consumption  and () is limited.But considering a chip as a large set of elementary electrical components, the linear relationship does not represent the entire consumption of a chip but only the data-dependent part.In addition to the previously mentioned state changes, the power consumption of a chip also contains other variable consumption.It would be assigned to a term denoted by  which is assumed independent from the other variables:  encloses offsets, time dependent components, and noise.Therefore, the basic model for the data dependency can be written as follows: where  is a scalar gain between the value of Hamming weight of  and the power consumption .

The Correlation Power Analysis.
When processing sensitive intermediate values, side-channel leakage brings datadependent power consumption or other physical behaviors.
In [7], we can see the power consumption is related to the status of the register.In this paper, we use the Hamming weight model to analyze the correlation between the power consumption and intermediate values.
The connection of the devices is shown in Figure 1.The computer sends the ciphertexts to the cryptographic device, for example, chips, smart cards, and microcontrollers.The attacker connects the resistor with the power line of the cryptographic device and acquires the traces of the power consumption by the oscilloscope, which are transmitted to the computer.
CPA is a useful attack method proposed by Brier et al. [12].First, the attacker should acquire a set of  power consumption traces corresponding to  different plaintexts.Let   denote the th power trace and  denote the set of traces.Second, the attacker will recover the correct key by guessing the key from 0 to 255.Assume that the handled value is the result of an  operation between a secret key byte  and a known plaintext byte ,   =   ⊕ .The attacker can predict the value of Hamming weight ℎ  of   in time for each acquired traces   .Equation (2) can compute the PCC between these predictions of Hamming weight ℎ  and the instantaneous power consumption of the set of acquired traces   .The maximum PCC corresponds to the correct key.This formula can also be calculated to deduce the leakage position on the trace:  The definition of -squared is fairly straightforward.It is the percentage of the response variable variation that is explained by a linear model.-squared is always between 0% and 100%.
Here, 0% indicates that the model explains none of the variabilities of the response data around its mean.100% indicates that the model explains all the variabilities of the response data around its mean.
A data set has  values marked by  1 , . . .,   (collectively known as   or as a vector  = { 1 , . . .,   }), each associated with a predicted (or modeled) value  1 ⋅ ⋅ ⋅   (known as   , as a vector ).Define the residuals as   =   −   (forming a vector ).If  is the mean of the observed data:  = (1/) ∑  =1   , then the variability of the data set can be measured using three sums of squares formulas: (i) The total sum of squares (proportional to the variance of the data) is (ii) The regression sum of squares, also called the explained sum of squares, is (iii) The sum of squares of residuals, also called the residual sum of squares, is (iv) The most general definition of the coefficient of determination is In general, the higher the value of the -squared is, the better the model fits the data.
However, PCC is used to describe the linear relationship between the two variables, but the scope of application of the -squared is more extensive.-squared can be used to describe the nonlinear or have two or more independent variables.Because of the random noise of the energy traces which are measured, the correlation between the energy traces and the template cannot be well reflected by the PCC.So we can be more accurate to determine the relationship between the power traces and the Hamming weight by squared method.

The 𝑅-Squared Fitting Analysis for Power Attack
As we know, it is assumed that the attacker has two capabilities.First, the attacker can operate the chosen-plaintexts attack.Second, the attacker can acquire the power consumption from the device under attack.The proposed approach is based on the Hamming weight model [7].The leakage position is the AES first round's S-box output value  which is stored in a specific register.Figure 2 shows the position under attack.
In the key recovery phase, the attacker first guesses  guess from 0 to 255, so the Hamming weight of  can be calculated by   = ((  ⊕  guess )).According to each , we choose the corresponding   from  to build  guess which can reflect the leakage of the register.We call it t.Then the -squared between  guess and  can be calculated: The  guess corresponding to the maximum -squared value is the correct key.

Attack Scene.
Algorithm 1 shows -squared fitting analysis steps on S-box output.When using -squared fitting method to analyze AES, there are two steps which must be completed.First, a template of traces  must be set up according to different Hamming weight, which can reflect the trace of each Hamming weight.For example, we randomly select  different plaintexts and keys, so that ((  ⊕)) = 0.And we can get  traces.The average of the  traces is  0 .By repeating the above steps, we change ((  ⊕ )) from 1 to 8, and  1 ⋅ ⋅ ⋅  8 can be obtained.The template  is { 0 ,  1 , . . .,  8 }.Second,  2 as a distinguisher identifies the correct key.According to the value of  2 between the real traces  and  guess which reflect  guess , we can judge the correct key.The formula of  2 is (7).
By selecting the maximum value of the  2 , we can judge which  guess is the most possible secret key.

Comparisons with Pearson's Correlation Coefficient
In order to evaluate  2 between the traces of power consumption and  guess , we use the software simulation so the test on AES can emerge, as shown in Figure 2. The parameters of simulating the traces of the register are that standard deviation  is 3 and the number of key points of the trace is 10.Based on the descriptions of Section 3, we try to use the -squared fitting model to analyze the leakage of power consumption and compare RFA with the classic CPA.The part under attack is the register which saves the output value of the S-box.
Figure 3 shows the fitting traces of the power consumption.We assume that the noise is Gauss random noise.And the traces are simulated on the computer.And then, we compute the correlation by -squared fitting model analysis and compare it with the classic CPA.The relationship between guessed key and correlate coefficient is shown in Figure 4. From Figure 4, we can see that the RFA can distinguish the correct key clearly which is 150.
We do the simulation experiments on the computer.When the standard deviation  = 2 and the number of key points of the trace is 5, we can find that we only need 12 plaintexts to get the correct key by RFA with the success rate 91%, while CPA is about 65% (as shown in Figure 5).
Figure 6 also shows the contrast between CPA and RFA in the case of different number of plaintexts and the standard deviation  = 4.Some similar contrasts are shown in Figures 7 and 8. From Figures 5-8, we can see that as the standard deviation increases, more plaintexts are required to recover the correct key in the same conditions.However, the efficiency of RFA is still better than that of CPA.
By comparing the success rate of RFA with CPA in different conditions, we get a conclusion that the RFA can judge the correct key the same as CPA, and the efficiency is slightly better than CPA.When the number of plaintexts is 5, the success rate of RFA is double the PCC.In the RFA, only 86 seconds are spent in the calculation process, and it is better than CPA by nearly 20%.This is because  2 costs lower computation complexity than PCC [19].

Application on the Other Attack Methods
In Section 4, we can see that the RFA is efficient compared to CPA.In this section, we will show that this method still can be widely used in the other attack models.We choose the second-order CPA attack [20] and collision-correlation power analysis on first-order protected AES [21] to compare the success rate between RFA and CPA in simulation scenario.

Implementation with Collision-Correlation Power Analysis.
Collision-correlation power analysis on first-order protected AES [21] is proposed in CHES 2011.This attack is more powerful and practicable than previous second-order power analyses and increases the risk that these implementations are broken in practice.The contrast of success rate of RFA and CPA is shown in Figure 9.
In Figure 9, we can see that the mask protection scheme is used to mask the first round before S-box in AES.And the attack steps are introduced in Algorithm 2. For each  2 , we encrypt it for  times, and we can find the relationship between  1 and  2 by detecting the collision between  1 and  2 so as to recover the correct key.In Figure 10, the success rate of RFA on collision-correlation attack is higher than CPA clearly, and the RFA also has more obvious advantages in operation time.

Implementation with Practical Second-Order CPA Attacks.
If the attacker can get the relationship between the power consumption and the intermediate value of cryptographic algorithms, the secret key can be recovered easily.In order to change this relationship, the designer puts the mask  into the intermediate value.In 2006, Oswald et al. proposed the mask protection scheme and its attack methods [20].
In Figure 11, RNG means the random number generator.This module can output random numbers which are called masks.These random numbers are involved in the encryption process.We can see that the information leakage of power consumption has been masked by the random masking of the operation, so the trace which we have held seems to be masked randomly and we cannot directly recover the correct Encrypt  2 and acquire  2, (5) end for (6) Algorithm 2: RFA on collision-correlation power attacks.key.For these mask protection schemes, we use the attack method in [20] with the RFA.In Algorithm 3, we show the attack steps.And the comparison with the success rate of CPA is in Figure 12.
In the above second-order attack scenario, we replace the CPA with RFA to recover the secret key.We can see the success rate of RFA is nearly the same as the CPA in Figure 12.But the time cost of RFA is also 20% less than CPA, as we explained in Section 4.

Discussion.
From formulas ( 7) and ( 8), we can see that the -squared used in this paper is equivalent to the Least Squared Method (LSM).Therefore, we try to study the LSM, Least Absolute Deviation (LAD) and LAD's variants in the application of RFA.In order to study the superiority of RFA, we experiment with LSM and LAD to evaluate the distance between  and  guess so as to see whether RFA is more appropriate than LAD:  In Figure 2 scenarios, we operate the experiments, respectively, with the LSM and LAD.The comparison of success rate is shown in Figure 13.We can see that the success rate of LSM is higher than LAD.So RFA method is more efficient than the LAD.

Conclusion
We present a new correlation analysis method which is called -squared fitting model analysis.Through the simulation with different experiment scenes, we can see that this method is better for the nonlinear correlation analysis.At the same time, we can see that RFA has the same success rate as the CPA in recovering the secret keys.And in some case, the performance is even better.Because of the ease of operation, the time complexity of RFA is more superior.

Figure 1 :
Figure 1: CPA connection of the devices.

2. 3 .
The -Squared Introduction.-squared is a statistical method to evaluate how close the data are to the fitted regression line.It is also known as the coefficient of determination or the coefficient of multiple determination for multiple regression.

Figure 2 :
Figure 2: The position under attack.

Figure 3 :Figure 4 :
Figure 3: The fitting traces of the power consumption when Hamming weight is 2.

Figure 5 :
Figure 5: The comparison between RFA and CPA.

Figure 6 :
Figure 6: The comparison between RFA and CPA.

Figure 7 :
Figure 7: The comparison between RFA and CPA.

Figure 8 :
Figure 8: The comparison between RFA and CPA. p

Input:Figure 10 :Figure 11 :Figure 12 :
Figure 10: The comparison of the RFA and CPA in first-order protected AES.

Figure 13 :
Figure 13: The comparison of the LSM and LAD.