The Rayleigh Fading Channel Prediction via Deep Learning

This paper presents a multi-time channel prediction system based on backpropagation (BP) neural network with multi-hidden layers, which can predict channel information effectively and benefit for massive MIMO performance, power control, and artificial noise physical layer security scheme design. Meanwhile, an early stopping strategy to avoid the overfitting of BP neural network is introduced. By comparing the predicted normalizedmean square error (NMSE), the simulation results show that the performances of the proposed scheme are extremely improved. Moreover, a sparse channel sample constructionmethod is proposed, which saves system resources effectively without weakening performances.


Introduction
The future wireless communications (5G) put forward the demands of high-speed transmission, quick access, high reliability, and strong security communications [1].Hence, new technologies should be adopted to meet the high-speed and high-efficiency transmissions and access demands of 5G [2].Massive MIMO, non-orthogonal multiple access (NOMA), and tight cooperation for wireless senor nodes are expected to become key technologies for the future 5G systems.A major limitation for massive MIMO, NOMA, and coordinated multipoint (CoMP) systems is the channel state information (CSI) knowledge at the transmitter, which can be obtained partly by the channel prediction techniques.Meanwhile, the physical layer security methods utilize channel reciprocity and diversity to accomplish the so-called "encryption" in the physical layer [3,4].Compared with the conventional cryptographic technologies, under the same security requirement, the key length in physical layer security is greatly reduced, and even not required, which is especially suitable for the quick access system.Unfortunately, physical layer security transmission is only dependent on physical CSI.For wireless fading channels, the change of channel information is not conducive to the implementation of the physical MIMO, cooperation, and the security.As shown in Figure 1, the channel information is constantly changing due to the change of the location of the legal receiver, which makes the base station unable to perform robust precoding or beamforming etc.Therefore, the channel prediction is a key point for such problems.
There are considerable research results on the channel parameter prediction.The literatures [5][6][7][8] employ the optimal linear algorithm and autoregressive tracing algorithm to predict the flat fading channel, in which channel impulse response prediction is performed by linearly combining the current CSI with the past one.In [5], performance analysis is carried out for long range prediction (LRP) under the actual channel model and the stationary random phase model.Complex-valued neural networks are discussed by T. Ding and A. Hirose to predict time-varying channels and applying them on the hardware [9].The error rates, compared to the traditional methods, have made improvements.The article [10] utilizes the echo state network to predict channel and proposes a fixed weight method, to reduce computing complexity.Literature [11] proposed a novel support vector machine method to predict a more sophisticated environment.The MUSIC algorithm for channel prediction is investigated in [12].However, the above-mentioned algorithms either have the lack of high estimation error rate or suffer from high complexity.The major flaws of these methods are that all of them only predict the parameters for the next moment without providing the prediction of CSI after the multiple moments.
In this paper, the multi-time channel prediction system is proposed by taking advantage of the BP neural network with the single hidden layer.Hinton puts forward the concept of deep learning in the year of 2006, which is actually the multi-hidden-layer multi-sensor neural network, including BP neural network and convolutional network [13].Deep learning is often used in computer vision, pattern recognition, and image classification [13][14][15].In this paper, we employ deep learning for wireless channel prediction, while the early stopping strategy is adopted to avoid overfitting [16].In addition, two-sample construction schemes, namely, the sparse sample construction scheme (SSCS) and normal samples construction scheme (NSCS), are proposed, which can reduce the computational cost and guarantee the prediction accuracy.
We adopt LTE standard frame structure, and the length of the frame is set to 10ms.Specifically, each frame is stratified into 10 subframes, and each subframe consists of two time slots.We assume that each time slot uses 1/3 overhead combtype pilots.The single-time channel prediction system can save 1/10 pilot resources, and it can save 1/60 system resources.The multi-time algorithm we proposed can save 1/2 or 2/3 pilot resources, which can save 1/6 or 2/9 system resources.So the algorithm we proposed in this paper is very meaningful and useful.
This article is structured as follows.Section 2 introduced the Rayleigh fading channel model and BP neural network, which will be the basis of our novel mothed.The multitime channel prediction system that can predict the channel information at multiple moments is presented in Section 3. Section 4 includes the simulation results and analyses.Conclusions are given in Section 5.

Preliminary
The symbols used in this article will be briefly described.Uppercase bold letters are used for the matrix and lowercase bold letters for vectors.The elements are represented by the letters with subscripts and not bold.The  ℎ vector and the  ℎ samples are presented by the superscripts with round brackets.

Rayleigh Fading Channel Model.
The propagation in any wireless channel is either a line-of-sight (LOS) propagation or a non-line-of-sight (NLOS) propagation.The probability density function (PDF) of a received signal in LOS environment obeys the Rician distribution, while the PDF of the received signal in the NLOS environment obeys the Rayleigh distribution.We can form a Rayleigh channel by scattering components without a direct path, which can be expressed as follows [17][18][19]: where  is the number of multipaths and   is the amplitude of the  ℎ path.  ,   represent the Doppler frequency shift and the phase of the  ℎ path, respectively.The Doppler frequency shift is expressed as   = (V/)  cos   , where V is the moving speed of the user. is the speed of light,   is the carrier frequency, and   is the angle between the user's moving direction and the incident radio wave angle.The sharp Rayleigh fading channels conforming to a given Doppler spectrum are generated by complex sine wave synthesis, just like the Jakes' channel model [20].The final channel information of the Jakes model is complex-valued, which is given by the following: In this paper, the deep learning samples are sampled at different transmission time slots, and the associated complexvalued CSIs of h (t) can be divided into real and imaginary parts.Accordingly, we predict the real value and the imaginary value of channel state information separately.The related processing procedure is given by the following: Then, we construct the deep learning samples by capturing channel information ℎ  (  ), ℎ  (  ).Finally, BP neural network is adopted to predict the channel information at a later time based on learning from the channel information of the past time slots.

Back Propagation (BP) Neural
Network.Hornik proved that the multi-layer feed forward network containing enough neurons in the hidden layer can approach a continuous function of arbitrary complexity and precision [21].The deep learning technology has been extensively used in computer vision, pattern recognition, and image classification [13][14][15].In this paper, the deep learning is exploited to match the fading channel changing trajectory and to achieve channel prediction.Backpropagation (BP) algorithm multi-layer feed forward neural network prediction model is used to predict the fading channel.Figure 2 illustrates a typical multi-input neural network, which includes an input layer, a hidden layer,

Input Layer
Hidden Layer Output Layer  and an output layer.In the field of machine learning, the neural network as shown in Figure 2 is generally called twolayer neural network (the input layer does not count), or a single hidden layer neural network.We will adopt this statement in this paper.
Then we get the input sample matrix, X = [x (1) , x (2) , . . ., and the output sample matrix, Y = [y (1) , y (2) , . . ., The input value of any node in neural network is the previous neuron multiplied by the weight plus the threshold and then activated by the activation function.Without loss of generality, taking the k-th hidden neuron as an example, . This paper will use the vectorization description to the neural network transmission formula.The neural network forward propagation vectorization is expressed as follows: where W ∈ R × is the weight matrix connected the input layer and the hidden layer, and V ∈ R × is the weight matrix connected the hidden layer and the output layer.

Ξ = [𝜉, 𝜉, ⋅ ⋅ ⋅, 𝜉] ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
is the hidden layer threshold matrix which consists of the hidden layer threshold vector  and Θ = [, , ⋅ ⋅ ⋅, ] ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟  is the output layer threshold matrix which consists of the output layer threshold vector .Z 1 is the hidden layer input matrix and B is the hidden layer output matrix.Z 2 is the input matrix for the output layer.Ŷ is the output vector of the output layer which is also the final output of the neural network. 1 ,  2 are the hidden layer and output layer activation functions, respectively.Note that  1 is always the sigmoid activation function and  2 is the purelin activation function.The functions operating on vector or matrix mean act on each element separately (e.g., (x) = ((1), (2)), x = (1 2)).Additionally, its dimension is the same as the original vector or matrix.(ŷ, y) is the loss function.We adopt the mean square error (MSE) of the output as the loss function.
( Ŷ, Y) is the cost function, and it is equal to the average of the loss function with  samples.The neural network iteratively updates the network weight matrix and the threshold vector by minimizing the cost function ( Ŷ, Y).
We get ŷ by forward propagation and then update the weight matrix W, V and threshold vector ,  by backpropagation with the gradient descent method.For convenience, the partial derivative of the cost function ( Ŷ, Y) to output Ŷ is denoted by  Ŷ, which is  Ŷ = ( Ŷ, Y)/ Ŷ.The vectorization representation of neural network backpropagation iteration formulas is given by the following: where the symbol * denotes the elements in matrix (or vector)  and B multiplied correspondingly.(⋅)  means the matrix transpose.  (⋅) represents the function derivation, and vector e 1 satisfies equation e 1 = (1, 1, ⋅ ⋅ ⋅, 1) ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

𝑚
. The parameter update rule is expressed as  ←  − Δ.For example, the parameter W is adaptively updated in the form of W = W − W, where  is the learning rate.As we know, the regularization, dropout, and early stopping strategies are employed to prevent the overfitting of the neural network [21,22].This paper adopts the early stopping strategy to avoid the risk of overfitting; we divide the input and output samples into training set, verification set, and test set.The training set is used to calculate the gradient and update the link weight and threshold.The verification set is used to estimate the error.If the training set error decreases with the validation set error increasing, training process will stop and the related weights and thresholds with the smallest validation set error will be returned.
In summary, combined with early stopping strategy, the gradient descent method is used to continuously update the neural network information, i.e., weight matrix W, V and threshold vector , .Once the parameters update is completed, the neural network will be ready to predict the channel state information (CSI).

Rayleigh Fading Channel Prediction Method
The multi-time channel prediction system with single hidden layer effectively predict the channel information at multiple moments, and the proposed deep learning prediction system of the multi-layer neural network can cope with a more sophisticated channel information prediction.

Prediction Scheme
3.1.1.Single-Time Prediction Scheme.Literature [10] proposed the prediction scheme of channel prediction through the echo state network (ESN).In [10], the channel information of the first  moments is regarded as the input samples and the channel information at the ( + 1) ℎ moment as the output samples.And the samples construction scheme of prediction system in this section will follow this program.Let the output layer's dimension be, which is mentioned above, equal to 1.That would be a single-time channel prediction system.Moreover, we add the estimation error at channel state information, which is where () is the Gaussian white noise, and its variance is  2  .We construct the following neural network training sample [10]: For the  ℎ input and output sample set (x () ,  () ), the input sample x () represents the channel information samples at the  ℎ time and the next ( − 1) times, and the output sample  () is the channel information samples at the ( + ) ℎ time.We choose   training samples to train the neural network.And the training set matrix is given by (1) , x (2) , ⋅ ⋅ ⋅ , x ( T ) ] y  = [ (1) ,  (2) , ⋅ ⋅ ⋅ , The test set matrix consisting of the next   samples will test the neural network, as shown below: (1) , x (2) , ⋅ ⋅ ⋅ , x ( R ) ] y  = [y (1) , y (2) , ⋅ ⋅ ⋅ , y ( R ) ] (13) The training set (X  , y  ) is used to train the neural network, while the test set (X  , y  ) will be utilized to measure the performance of neural network.

Multi-Time Prediction
System.The existing researches of channel prediction [5][6][7][8][9][10][11][12] are either not accurate enough or too complicated to perform in resource-constrained sensor networks.The echo state network (ESN) channel prediction [10] greatly improves the system predictive performance and reduces the system complexity properly.But it only can predict channel information at the next moment.In this section, a multi-time prediction system is exploited to predict channel information at multiple time slots.It achieves a stronger engineering performance.Furthermore, we propose two-sample construction methods, i.e., sparse sample construction scheme (SSCS) and normal samples construction scheme (NSCS).In the following, we will discuss these two schemes in detail.

(a) Normal Samples Construction Scheme (NSCS).
The normal samples construction scheme is a continuous sample construction method.It just adds more outputs on Y. Zhao's scheme.The  ℎ input and output samples are expressed as follows: x () = [ h (  ) , h (( + 1)   ) , . . ., h (( +  − 1)   )] The NSCS takes full use of the channel information, but it increases the amount of computation.For example, when  = 10,  = 10, we can construct 4990 training samples from 5000 channel information sample values.There is only one channel data difference between two adjacent training samples.

(b) Sparse Samples Construction Scheme (SSCS).
In order to reduce the computational complexity, we propose a sparse sample construction scheme.That is, there is no duplicate channel state information for any two input sample information, as follows: The SSCS only needs 500 samples to traverse 5000 channel information sample values, if  = 10,  = 10.
1.The weight matrices W, k are initialized randomly from 0 to 1.The threshold vectors ,  initialized to 0. Set the training goal  goal and learning rate  a reasonable value, respectively; 2. Input the channel information training set (X  , y  ). 3. while (ŷ  , y  ) >  goal do: 4. Calculate Z 1 , B, z 2 , ŷ , and the (ŷ  , y  ), (ŷ  , y  ) of the loss function and cost function according to equation ( 7); 5.According to equation ( 8), the gradient of the output layer weight matrix k and the gradient of the threshold  are calculated respectively; 6.The weight matrix of the hidden layer W and the gradient of the threshold vector  are calculated are calculated according to (9); 7. Update the weight matrix of the hidden matrix and the output layer W, V and the threshold vectors , ; 8. End while 9. Input the channel information test set (X  , y  ) and calculate the NMSE according to (16).(8), calculate the gradient of the output layer weight matrix V and threshold vector , respectively; 10.According to (9), calculate the gradient of hidden layer weight matrix W and threshold vector , respectively; 11.Update the weight matrix of hidden layer and output layer W, V, and the threshold vector , ; 12. End for 13.
Input the channel information test set (X  , Y  ), and calculate the NMSE according to (16) Algorithm 2 The simulation and analysis of the two schemes will be carried out, respectively.The prediction performance metric is expressed by the normalized mean squared error (NMSE).
The NMSE at the  ℎ time slot is given by the following: In addition, we adopt the early stopping strategy for the multi-time channel prediction system to avoid overfitting.More specifically, the early stopping divides the sample set into training set (X  , y  ), validation set (X  , y  ), and test set (X  , y  ).The training set (X  , y  ) is used to calculate the gradient and update the link weight matrixes W, V and threshold , .The verification set is only used to estimate the cost ( Ŷ , Y  ).If the training set error decreases and the validation set error increases, the training will stop and the connection weights W ℎ , V ℎ and thresholds  ℎ ,  ℎ will return.The specific algorithm is shown in Algorithm 1 and Algorithm 2.

Multi-Input and Multi-Output Multi-Layer Neural Network Channel Prediction
System.Deep neural network can effectively predict the channel information when channel environment is complicated.More importantly, the deep neural network has better performance than other processing means.Meanwhile, deep neural networks achieve the same performance with fewer neurons.As shown in Figure 3, there is a three-layer neural network structure with double hidden layer, which is used to predict a more complicated channel.The parameter update of the three-layer neural network is almost the same as that of the two-layer neural network.Both adopt the backpropagation algorithm, except for the former needing to update three weight matrixes and three threshold vectors in one epoch.

Complexity Analysis.
In this section, we compare the computational complexity of existing algorithms with deep learning methods.The existing methods, such as AR methods [6], DWT-AR-LR methods [9], the ESN method [10], and SVM prediction methods [10], will be mentioned below.The computing complexity of AR method is O(  ), where   denotes the order of AR.The complexity of ESN prediction method is O(max(,   , )), where  is the number of variables,   is the number of nonzero elements of middle layer weight matrix, and  is the number of variables in the middle layer.DWT-AR-LR method's complexity is O(max(  ,   ,   )) where   ,   , and   represent the number of samples and the order of AR and LR, respectively.
Note that the propagation weight matrix W, V and threshold vector ,  of the neural network prediction algorithm proposed in this paper are calculated offline.In addition, its overhead is very small.The computational complexity of the mathematical operations of W  x () , V  b and g 2 (V  b + ) in the neural network are O( × ), O( × ) and O(), respectively.Accordingly, the computational complexity of the neural network channel prediction system is O(max( × ,  × )), where , ,  are the number of neurons in input layer, hidden layer, and output layer, respectively.In this paper, the number of neurons is very small (e.g., d=10, q=10, p=10), especially in the multi-layer neural network.And there comes the low complexity.

Simulations
4.1.Single-Time Channel Prediction.Firstly, we use the Jakes model to simulate three channel predicted systems [20].The channel power is fixed to  2 0 = 1.In simulation, we set 34 ( = 34) scattering components, 500 ( = 500) channel information samples, and the sampling interval to be  = 1× 10 −4 .The maximum Doppler frequency shift is   = 926.Phase  observes uniform distribution, i.e.,  ∼ (−, ).We obtain 400 neural network samples through 500 channel information samples.Set the training samples   = 200, test samples   = 200, learning rate  = 0.001, and the target error  g = 1 × 10 −4 .Figure 4 depicts the amplitude and phase of the simulated and predicted channels under the Jakes model.We can see that the channel predicted by the BP neural network is almost identical to the simulated channel of Jakes model.
NMSE is the performance measure; Figure 5 is the comparison of different prediction methods.The x-axis is the signal-to-noise ratios (SNR) of the channel information ℎ() and noise (), while the y-axis is the NMSE.The red line with triangle is the performance of single-time predict system employing two-layer BP neural network.Rich neuron information gives the BPNN long-term channel memory capability.Thus, it can effectively perform channel prediction.As shown in Figure 5, the NMSE of BP neural network prediction algorithm gradually decreases with the increase of SNR and eventually reaches zero.With a low computational complexity, the accuracy of BPNN method is better than other methods (i.e., SVM, ESN, and DWT-AR-LR).  [9 ３６- [11]   ％３． [10]    In order to verify the robustness of the algorithm, we present various simulations under different fading Rayleigh channels, for example, the fast fading channel Clarke/Gan's model [23] and the well-known 3GPP Spatial Channel Model (SCM) [24] for MIMO systems.
Figure 6 demonstrates the predicted normalized mean square error (NMSE) under different Rayleigh fading channels.As we know, the autocorrelation of CSI satisfies the zeroorder Bessel function over time.Thus, a bad time domain correlation leads to a more difficult channel prediction.For the poor time domain correlation, the CSI of Clarke/Gan's model is sampled in the frequency domain and transformed to the time domain by IFFT, leading to a poor channel prediction performance.Owing to the strong time domain correlation, Jakes model has the best prediction performance under different values of SNR.
Undeniably, the BP neural network also faces the problem, as other algorithms, which is that the poorer time domain correlation of CSI, the more difficulty to predict future channel.In short, the BPNN algorithm performs better than the other two algorithms, and the prediction performance of the Jakes channel model is the best.

Multi-Time Channel Prediction System.
Similarly, the CSI is generated by the Jakes model for multi-time channel prediction system.The major difference is that we use 5000 channel information samples, i.e.,  = 5000.Other parameters are the same as the single-time channel prediction system.
We research two-sample construction methods (NSCS and SSCS) of multi-time channel prediction system.Under the two strategies, we select 4000 samples and 400 samples, respectively, and 75% of which are training samples, 15% of which are verification samples, and the remaining samples are the test samples.The dimension of input layer sample is = 10.The number of hidden layer neurons is  = 10.Output layer neurons are  = 10 and = 20, respectively.

Comparison of Two-Sample Construction Schemes with 10-Input and 10-Output (d=10, p=10).
For multi-time channel prediction system, as the prediction time increases, the corresponding error increases exponentially.We compare the prediction of two sampling methods, i.e., NSCS and SSCS.
Figure 7 shows that prediction accuracy generally improves as the number of epochs increases.Owing to the early stopping strategy, it is shown that the normal sample construction scheme stops after 224 epochs and the sparse sample prediction scheme stops after 42 epochs.The SSCS scheme has fewer iterations than the NSCS scheme, which can increase the speed of operation and save system resources.
Figure 8 is the NMSE performance of two-sample construction schemes.We can see that the performance of NSCS is better than that of SSCS at the cost of computation complexity.On the other hand, the performance difference between NSCS and SSCS is less than 10 −4 which can be ignored.Moreover, in order to achieve the same target of NMSE, the latter has less epochs than the former, which is more practical.Notice that an epoch SSCS takes less time than NSCS.To summarize, the NSCS and SSCS we proposed both meet the requirement in [25] of a low estimated error.The SSCS effectively reduces the resource consumption without degrading system performance.p = 20).Figure 9 shows that the normal samples construction scheme stops after 86 epochs.The sparse sample prediction scheme stops after 61 epochs.Figure 10 is the NMSE of two-sample construction schemes.The multi-time prediction, just like the 10-input and 10-output prediction system, error increases exponentially network.However, its effectiveness is not obvious since the channel information is not very complicated.

Conclusions
The BP neural network with multi-hidden layer is introduced into the channel prediction application.A novel multiple moment CSI prediction scheme is proposed for improving the performance of the massive MIMO, NOMA, CoMP, and physical layer security schemes.The proposed prediction scheme can perform effectively with a short pilot overhead, which is suitable for resource-constrained communication scenes.Meanwhile, we proposed two significant sample construction methods, which extremely improves the prediction performance and reduces the computing complexity.Wide experiences verified the effectiveness of our proposed scheme

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Figure 4 :
Figure 4: Simulated channel and the predicted channel amplitude, phase.
The weight matrices W, V are initialized randomly from 0 to 1.The threshold vectors ,  are initialized to 0. Set the training goal   and the learning rate  to a reasonable value, respectively.The intermediate variable  is initialized to 1; 2. Input the channel information training set (X  , Y  ) and verification set (X  , Y  ) to train the neural network.3.For ( Ŷ , Y  ) >  goal : 4. Calculate the hidden layer Z 1 , B, output layer data Z 2 , Ŷ and cost function ( Ŷ, Y) of training set and verification set according to equation (7), respectively; 5.If ( Ŷ , Y  ) > ;