Physical-Layer Channel Authentication for 5G via Machine Learning Algorithm

By utilizing the radio channel information to detect spoofing attacks, channel based physical layer (PHY-layer) enhanced authentication can be exploited in light-weight securing 5G wireless communications. One major obstacle in the application of the PHY-layer authentication is its detection rate. In this paper, a novel authentication method is developed to detect spoofing attacks without a special test threshold while a trained model is used to determine whether the user is legal or illegal. Unlike the threshold test PHY-layer authentication method, the proposed AdaBoost based PHY-layer authentication algorithm increases the authentication rate with one-dimensional test statistic feature. In addition, a two-dimensional test statistic features authentication model is presented for further improvement of detection rate. To evaluate the feasibility of our algorithm, we implement the PHYlayer spoofing detectors in multiple-input multiple-output (MIMO) system over universal software radio peripherals (USRP). Extensive experiences show that the proposed methods yield the high performance without compromising the computing complexity.


Introduction
5G mobile communication system puts forward the requirements that are high-speed, high efficiency, and high security under three typical application scenarios: enhanced Mobile Broadband (eMBB), Large-Scale Internet of Things (IoT), and ultra Reliable & Low-latency Connections (uRLLC) [1,2].The specific application scenarios that enhance the need for mobile broadband including high-traffic and high-density wireless networks are densely used in indoors or urban areas, in which large-area signals of wireless mobile networks are continuously covered in rural areas.Meanwhile, 5G involves the interconnection and communication between a large number of machines and equipment, which is a necessary condition for the operation of IoT [3].Many mobile devices access the wireless network at the same time, which results in heavy burden of authentication computing in the wireless network.Therefore, lightweight access methods are required for intensive application scenarios of 5G wireless communication networks.
In response to this need, scholars have successively carried out researches on light-weight security measures based on computational cryptography [4,5].However, it is still very difficult to use the cipher algorithm that meets the resource-constrained application scenarios such as wireless mobile terminals, IoT, and sensor networks.Therefore, there is a need to find new technologies to construct the lightweight security scheme.In the last decade, the research of PHY-layer security technology has brought new vitality to the wireless mobile communication industry [6][7][8][9][10].The physical layer of the characteristics is difficult to be counterfeit, which can provide high level security with low cost to overcome the lack of the cipher based security technologies.Consequently, physical layer characteristics which can be used to improve the security of wireless communications have been widely concerned for researchers.
Several PHY-authentication techniques are proposed.In [11][12][13][14][15][16][17], the received signal strength (RSS) and channel impulse response (CIR), as well as channel state information (CSI), are utilized to detect identity-based attacks in wireless networks, such as man-in-the middle and denial-of-service (DoS) attacks.The work [18] presents a PHY-authentication framework that can be adapted for multicarrier transmission.In order to detect Sybil attacks, [19,20] present a PHY-authentication protocol that combines with high-layer authentication based on the channel response decorrelations rapidly in space, and channel-based detection of Sybil attacks in wireless networks is implemented.In [21], Peng Hao et al. developed a practical authentication scheme by monitoring and analyzing the packet error rate (PER) and received signal strength indicator (RSSI) at the same time to enhance the spoofing attack detection capability.In [22][23][24], the authors analysed the spatial decorrelation property of the channel response and validated the efficacy of the channel-based authentication for spoofing detection in MIMO system by the comparison between channel information "difference" of two or several frames.However, in above-mentioned works, artificial thresholds are needed to detect spoofing attack.In fact, threshold range cannot be accurately confirmed, resulting in spoofing detection with low precision.In this paper, a machine learning based PHY-layer authentication is developed, which provides an intelligent decision method instead of a one-dimension test threshold.Specifically, Adaboost [25,26] based algorithm with one-dimensional feature is employed to detect spoofing attacks.To enhance authentication performance, the two-dimensional feature is carried out.The major contributions of this paper are summarized as follows: (1) An AdaBoost based PHY-layer authentication algorithm is proposed to increase the authentication rate.
(2) The authentication model based on two-dimensional feature is established, which has a stronger performance for cheating detection than the onedimensional authentication method.
(3) The proposed PHY-layer channel authentication scheme is implemented in a real world environment, based on MIMO-OFDM systems.The simulation results show that the detection rate is greatly increased.
The rest of this paper is organized as follows.Section 2 describes system model and problem formulation.Our proposed algorithm for PHY-layer authentication is presented in Section 3. The system experiment and simulation results are presented in Section 4. In Section 5, we conclude this paper.

System Model
In this section, we provide a system model of physical layer authentication and hypothesis testing. . .MIMO ree Parts System Model.As shown in Figure 1, our analysis is based on an Alice-Bob-Eve model in MIMO system, where Alice and Bob are legitimate users equipped with N T and N R antennas, respectively.Eve with   antennas attempts to spoof Alice by using her identity.They are assumed to be located in spatially separated positions.In order to address this spoofing detection, Bob tracks the uniqueness of wireless channel responses to discriminate between legitimate signals from Alice and illegitimate signals from Eve.That is a physical layer authentication.The detailed physical layer authentication process is as follows: Signals with the pilots which can be used to estimate the channel response of the corresponding transmitter are transmitted over the wireless multipath channel to the receiver.The th transmission data contains   -frames, while each frame consists of   OFDM symbols.
Bob is assumed to obtain the Alice-Bob channel information for any frame index  > 1, Ĥ  , and save it which extracted by the channel estimation.After a while, when Bob receives the next data frame, the k + 1th data frame, Ĥ +1 , which is extracted and estimated by Bob the unknown channel response information.Bob compares Ĥ +1 with the channel of Alice, Ĥ  , to determine whether the corresponding signal is actually send by Alice.
If the values of Ĥ  and Ĥ +1 are approaching, Bob considers the sender's identity as valid and stores it.On the contrary, Bob determines that the sender's identity is invalid and directly abandons the data frame.
Channel information is detected by the channel estimation algorithm, denoted by Ĥ  and Ĥ +1 .Each data frame contains  s OFDM symbols.Thus, the channel information is given by where Ĥ , ( = 1, 2, . . .,   ) denotes the -th OFDM symbol of channel information.
. .Hypothesis Testing.A binary hypothesis testing is performed to determine the identity authentication in the continuous data frames.Let the receiver Bob verify that the kth data frame originates from the legitimate sender Alice, and the extracted channel information is    ; the sender of the k + 1 th data frame is still unknown and the channel information is   +1 : the null hypothesis H 0 indicates that the packet is indeed sent by the Alice.The alternative hypothesis H1 is that the real client of the packet is not Alice.The spoofing detection builds the hypothesis test given by where all elements of   and  +1 are i.i.d.complex Gaussian noise samples (0,  2 ).Therefore, if channel information for hypothesis testing is directly used, the need of considering the impact of noise variables will increase the certification complexity.To this end, since   and  +1 are with the same statistical characteristics, the "difference" of channel information can eliminate the influence of noise variables.
The physical layer authentication translates into the comparison between the "difference" of the channel information and the set threshold.Equation ( 2) can be expressed as where diff(, ) denotes the calculating result of the difference between A and B and  is the test threshold.The null hypothesis,  0 , is that the identity is legitimate and Bob accepts this hypothesis if the test statistic he computes, diff(, ), is below some threshold .Otherwise, Bob accepts the alternative hypothesis,  1 , that the identity is illegitimate.The channel response "difference" is recorded as T, and (3) can be also written as As shown in ( 4), the physical layer authentication is actually a comparison between channel information "difference" and authentication threshold.Thus, the difference between channel information and authentication threshold is the key of physical layer authentication.The test statistics can measure the similarity of channel information and calculate the channel information difference.In this paper, we use two kinds of test statistic T A and T B , respectively.In particular, assuming Bob obtains two consecutive frame channel response of Ĥ −1, and Ĥ , , respectively, from Alice.We build test statistics of   and   based on the two frames for the purpose of discrimination identity of Alice or Eve.Subsequently, Bob acquires the k+1th frame channel information as Ĥ where θ(, ) is the phase offset and can be denoted by From ( 5),   can be taken as the difference of the subcarrier amplitude, which avoids the effect of θ(, ).

Two consecutive data frames, Ĥ𝐴𝐵
, and Ĥ +1, , represent measurement errors in the phase of the channel response.Each channel response value consists of   frequency domain channel matrix, which is OFDM symbol of N dimensional square matrix and  denotes the th row and  denotes the column element phase offset.

Physical Authentication with AdaBoost Algorithm
In this section, we propose a learning algorithm based on AdaBoost for physical authentication.
. .AdaBoost Algorithm.AdaBoost is the abbreviation of adaptive boosting and developed by Yoav Freund [24] and is the most widely used form of boosting algorithm.Boosting is a powerful technique combined with base classifiers [25] to produce a form of committee whose performance can be significantly better than other base classifiers.The principal of AdaBoost algorithm is that this algorithm improves its performance by the iterative algorithm, which is adaptive in the sense that subsequent weak classifiers, called as learners, are adjusted to improve those instances misclassified by previous classifiers.AdaBoost can be seen as a particular method of training a boosted classifier.A boost classifier is a classifier as follows: where each   () is a weak classifier that takes  as input and returns a value   indicating the class of .The weak classifiers, each of classifiers is trained by using a weighted coefficient  , from the data set where the weighting coefficient associated depending on the performance of the weak classifiers such as decision tree (support vector machine) SVM, are trained in sequence.More specially, data points which are misclassified by one of the weak classifiers are being given greater weight, which are used to train the next weak classifier.As illustrated in Figure 2, once all the classifiers have been trained until there are no misclassified data points, then their final model is generated via a weight majority voting scheme.
. .Physical Authentication with AdaBoost Algorithm.The physical authentication with AdaBoost algorithm is proposed for detection spoofing.The performance chart of the algorithm is illustrated in Figure 3. Bob collects the channel matrix, Ĥ 1 , which obtained by channel estimation using the pilot from Alice and records it.When Bob receives the next data frame from the Alice, the Bob collects channel information, Ĥ Then the coefficient   of   is calculate as Finally, we generate a final model that different weight is being given to different weak classifiers in (8) (2) Two-dimension test statistics training data set:

Input:
The channel information of legal transmitter or illgal transmitter: Process: 1: Bob calculates the value of data set Ĥ and Ĥ from Alice and simulated Eve: The data set are preprocessed by Bob: 3: The data set are divided into two parts, and the one is training data set and the other is testing data set: 4: Use training data set to get the weak classifier: 5: Use the Adaboost algorithm to generate a strong classifer: 6: The testing data set is used to verify whether the claasifier can achieve the target detection rate, otherwise it will return to the first step: 7: The final classifier is the authenticaton decision model, which can judge whether the new packets are legitimate or illegal: End Algorithm 1: Physical authentication.

Input:
training data set : Process: 1: Initialize the weight distribution of the sample points:

Experimental Verification
In this section, we will describe the system setup and the test process of measuring the Algorithm 1 for detecting Alice and Eve.
. .System Setup.We consider the spoofing detection of a receiver called Bob, the legal transmitter called Alice, and the spoofing node called Eve.They were placed in three separate locations in a room, surrounded by many other devices such as printers, desktops, and other types of equipment as shown in Figure 4.There are scattering and refraction phenomena in the room due to the presence of obstacles in the wireless channel from Alice to Bob and Eve to Bob.As shown in Figure 5, we set up experimental platform which implemented on USRPs, and experiments were performed in an indoor environment.Bob is equipped with an 8 * 8 MIMO system, Alice is equipped with a 2 * 2 MIMO system, and the spoofing node called Eve is equipped with a 2 * 2 MIMO system.The signals are sent over 2 antennas each at center frequency 3.5GHz with bandwidth 2MHz.  . .Experiment.In the experiment, the following steps are taken.
Step .Bob extracts channel information from Alice and Eve by the existing channel estimation mechanisms, respectively.
Step .The two classification training data set T is generated according to (15) or (16).
Step .Bob is trained to generate a strong classifier based on the training data set of two classifications by using AdaBoost algorithm under Matlab program.
Step .Bob uses a strong classifier to judge the test set and obtain the authentication detection rate.
In the experiment, we consider that the collection frames are five hundred frames and the value of test statistic was normalized between 0 and 1.The test statistic   of channel information of the Alice and Bob as a function of frames is shown in Figure 6(a), in which the red points is   () in ( 5) and green points is    () in (9).As can be seen, there is the overlapped area.Meanwhile, from Figure 6(b), the overlapped area is large, when we chose the test statistic   of channel information in which the red points is   () in ( 7) and green points is    () in (10).It is clearly shown that it is difficult to acquire the best manual test threshold for the accuracy of authentication.Moreover, we use   ,   , and the number of frames, respectively, to draw a three-dimensional plot.As shown in   . .Simulation Results.In this section, simulation results are provided to demonstrate the performance of the proposed authentication scheme.As a comparison, we considered the PHY-layer spoofing detection [15] with a varied test threshold.From the Figure 8, we can see that when test threshold equals 0.4, the best authentication detection rate results of using   or   reached 79.8% and 65.4%, respectively.In addition, our proposed method which combined two test statistics   and   as a two-dimensional feature can improve the accuracy of detection.We use   ,   , and the number of frames, respectively, to draw a three-dimensional plot.Figure 9 illustrates the comparison of spoofing detection among the three methods, from which we can conclude that manual threshold method based on   test statistics can achieve 79.8% detection rate  while machine learning based authentication method with   test statistic can acquire 87.1% detection rate and machine learning based authentication method with two-dimensional features   and   can achieve 91.3% accuracy rate with an additional 10% more computation complexity.
To sum up, the proposed authentication scheme achieves a superior performance over manual threshold strategy [15].Based on the above observation, the proposed machine learning based authentication scheme with tow-dimensional feature not only exhibits excellent performance than manual method but also has higher authentication rate than that of the same algorithm with one-dimensional feature.

Conclusions
In this paper, machine learning algorithm based physicallayer channel authentication for the 5G wireless communication security is proposed.A machine learning authentication method could draw a conclusion whether the received packets are from a legitimate transmitter or from a counterfeiter by using one-dimension or two-dimensional joint features.The effectiveness of the proposed authentication scheme is validated by widely simulations.All the data used in the simulation are derived from real OFDM-MIMO communication platform, which provides a real communication environment.Moreover, the authentication results show that the novel methods provide a higher rate in detecting the spoofing attacks than those of the manual threshold based physical layer authentication schemes.The training of the classifier can be done offline.Therefore, the novel method can perform authentication fast.In addition, whether we can use more machine learning algorithms to further optimize our authentication model and find a better statistical test of large difference in channel information is issue that we need to deal with in the future.

Figure 4 :
Figure 4: The experiments consisted of Alice, Bob, and Eve.

Figure 5 :
Figure 5: Real MIMO communication platform consisted of Alice, Bob, and Eve.
Normalized   of Alice and Eve

Figure 6 :Figure 7 :
Figure 6: Normalized   and   value of the legal transmitter Alice and the spoofing node Eve for spoofing detection with center frequency 3.5GHz with bandwidth 2MHz.

Figure 7 ,
Figure 7, obviously, it is hard to use the traditional manual threshold method to identify the identity of data sets in the three-dimensional condition.However, machine learning algorithm based the authentication model can effectively settle this problem and a dividing curved surface can perform the identification by the AdaBoost adaptive adjustment algorithm.

Figure 8 :
Figure 8: Correct classification rate of   and   .

Figure 9 :
Figure 9: The simulation result with the different method of authentication scheme.
In the same time an Eve sends the data frames to the Bob and claims that he is Alice.In practical communication scenarios, we do not know where and who Eves are.But in proposed scheme Eves are needed to be test training purpose.Therefore, one or several Eve nodes are set for this purpose.Bob continuously extracts the continuous N frames channel information from Eve and stores as Ĥ = [ Ĥ The data set is preprocessed by Bob.Firstly, Bob calculates the value of data set, Ĥ , Ĥ .Secondly, Bob calculates the test statistics based on test statistics   ,   as   represents that the transmitter is the illegal transmitter from Eve. Bob uses the two classification training data set    ,    ,    , and    as input training set.Spoofing detection is essentially a two-classification problem, which is considered to be solved through AdaBoost algorithm.The training data is made up of a bunch of sample points.Each sample point comprises input sample   and label   where   ∈ {−1, 1}.Each sample point is given an associated weight parameter  , ,  means -th training,and  means the number of sample points, which is initially set 1/ for all sample points.We suppose that we have a procedure available for training a weak classifier using weighted sample points.At each iteration of the training process, AdaBoost trains a new weak classifier by using the sample points in which the weighting coefficients are adjusted according to the performance of the previously trained weak classifier, so as to give greater weight to the misclassified data points, in which the classification error rate   is used to evaluate misclassified data set 2 .Similarly, Bob collects continuous Nframes channel information from Alice and stores as Ĥ = [ Ĥ 1 , Ĥ 2 , . . ., Ĥ  ]. 1 , Ĥ 2 , . . ., Ĥ  ].  = { 1 , . . .,   , . . .,   ,   } , (11a)    = { 1 , . . .,   , . . .,   ,   } , . The AdaBoost algorithm is given as in Algorithm 2, in which the point of the training data can be doubled by combining with the onedimension test statistics   and   together and become